Virtual Window

ABSTRACT

Novel tools and techniques are provided for displaying video. In some embodiments, novel tools and techniques might be provided for sensing the presence and/or position of a user in a room, and/or for customizing displayed content (including video call content, media content, and/or the like) based on the sensed presence and/or position of the user. In particular, in some aspects, a user device (which might include, without limitation, a video calling device, an image capture device, a gaming console, etc.) might determine a position of a user relative to a display device in communication with the user device. The user device and/or a control server (in communication with the user device over a network) might adjust an apparent view of video or image(s) displayed on the display device, based at least in part on the determined position of the user relative to the display device.

CROSS-REFERENCES TO RELATED APPLICATIONS

This application is a continuation application of U.S. patentapplication Ser. No. 14/479,169 filed Sep. 5, 2014 by Shoemake et al.and titled, “Virtual Window” (attorney docket no. 0414.11, referred toherein as the “'169 application”), which claims the benefit, under 35U.S.C. §119(e), of provisional U.S. Patent Application No. 61/874,903,filed Sep. 6, 2013 by Shoemake et al. and titled “Virtual Window”(attorney docket no. 0414.11-PR, referred to herein as the “'903application”), the entire teachings of both of which are incorporatedherein by reference.

The '169 application is also a continuation-in-part of U.S. patentapplication Ser. No. 14/106,263, filed on Dec. 13, 2013 by Shoemake etal. and titled “Video Capture, Processing and Distribution System”(attorney docket no. 0414.06, referred to herein as the “'263application”), which claims the benefit of provisional U.S. PatentApplication No. 61/737,506, filed Dec. 14, 2012 by Shoemake et al. andtitled “Video Capture, Processing and Distribution System” (attorneydocket no. 0414.06-PR, referred to herein as the “'506 application”).The '169 application is also a continuation-in-part of U.S. patentapplication Ser. No. 14/170,499, filed on Jan. 31, 2014 by Shoemake etal. and titled “Video Mail Capture, Processing and Distribution”(attorney docket no. 0414.07, referred to herein as the “'499application”), which claims the benefit of provisional U.S. PatentApplication No. 61/759,621, filed Feb. 1, 2013 by Shoemake et al. andtitled “Video Mail Capture, Processing and Distribution” (attorneydocket no. 0414.07-PR, referred to herein as the “'621 application”).The '169 application is also a continuation-in-part of U.S. patentapplication Ser. No. 14/341,009, filed on Jul. 25, 2014 by Shoemake etal. and titled “Video Calling and Conferencing Addressing” (attorneydocket no. 0414.08, referred to herein as the “'009 application”), whichclaims the benefit of provisional U.S. Patent Application No.61/858,518, filed Jul. 25, 2013 by Shoemake et al. and titled “VideoCalling and Conferencing Addressing” (attorney docket no. 0414.08-PR,referred to herein as the “'518 application”). The '169 application isalso a continuation-in-part of U.S. patent application Ser. No.14/472,133, filed on Aug. 28, 2014 by Ahmed et al. and titled “PhysicalPresence and Advertising” (attorney docket no. 0414.10, referred toherein as the “'133 application”), which claims the benefit ofprovisional U.S. Patent Application No. 61/872,603, filed Aug. 30, 2013by Ahmed et al. and titled “Physical Presence and Advertising” (attorneydocket no. 0414.10-PR, referred to herein as the “'603 application”).The '169 application is also a continuation-in-part of U.S. patentapplication Ser. No. 14/106,279, filed on Dec. 13, 2013 by Ahmed et al.and titled “Mobile Presence Detection” (attorney docket no. 0414.12,referred to herein as the “'279 application”), which claims the benefitof provisional U.S. Patent Application No. 61/877,928, filed Sep. 13,2013 by Ahmed et al. and titled “Mobile Presence Detection” (attorneydocket no. 0414.12-PR, referred to herein as the “'928 application”).The '169 application is also a continuation-in-part of U.S. patentapplication Ser. No. 14/106,360 (now U.S. Pat. No. 8,914,837), filed onDec. 13, 2013 by Ahmed et al. and titled “Distributed Infrastructure”(attorney docket no. 0414.13, referred to herein as the “'360application”). The '169 application is also a continuation-in-part ofU.S. patent application Ser. No. 14/464,435, filed Aug. 20, 2014 byShoemake et al. and titled “Monitoring, Trend Estimation, and UserRecommendations” (attorney docket no. 0414.09, referred to herein as the“'435 application”).

This application may also be related to provisional U.S. PatentApplication No. 61/987,304, filed May 1, 2014 by Shoemake et al. andtitled “Virtual Remote Functionality” (attorney docket no. 0414.15-PR,referred to herein as the “'304 application”).

The respective disclosures of these applications/patents (which thisdocument refers to collectively as the “Related applications”) areincorporated herein by reference in their entirety for all purposes.

COPYRIGHT STATEMENT

A portion of the disclosure of this patent document contains materialthat is subject to copyright protection. The copyright owner has noobjection to the facsimile reproduction by anyone of the patent documentor the patent disclosure as it appears in the Patent and TrademarkOffice patent file or records, but otherwise reserves all copyrightrights whatsoever.

FIELD

The present disclosure relates, in general, to tools and techniques forimplementing video communications or presenting media content, and, moreparticularly, to tools and techniques for sensing the presence and/orposition of a user in a room, and/or for customizing displayed content(including video call content, media content, and/or the like) based onthe sensed presence and/or position of the user.

BACKGROUND

The proliferation of capable user devices, pervasive communication, andincreased bandwidth has provided opportunity for many enhanced servicesfor users. One example is video calling. Once the domain of high-end,dedicated systems from vendors such as POLYCOM®, video calling hasbecome available to the average consumer at a reasonable cost. Forexample, the Biscotti™ device, available from Biscotti, Inc., providesan inexpensive tool to allow video calling using a high-definitiontelevision and an Internet connection. More generally, a class ofdevices, which have been described as “video calling devices” but arereferred to herein as video communication devices (“VCDs”) can besimultaneously connected to a display (such as a television, to name oneexample) and a source of content (such as a set-top box (“STB”), to namean example) in a pass-through configuration and can have a networkconnection and/or sensors such as a camera, a microphone, infraredsensors, and/or other suitable sensors. Such devices present a powerfulplatform for various applications. Examples include, without limitation,video calling, instant messaging, presence detection, status updates,media streaming over the Internet, web content viewing, gaming, and DVRcapability. Another example of such value added services is theintroduction of online gaming. Rather than playing a game by him- orherself, a user now can play most games in a multiplayer mode, usingcommunication over the Internet or another network.

Enabling such services is a new class of user device, which generallyfeatures relatively high-end processing capability (which would havebeen unthinkable outside supercomputing labs just a few years ago),substantial random access memory, and relatively vast non-transientstorage capabilities, including hard drives, solid state drives, and thelike. Such user devices can include, without limitation, the VCDsmentioned above, the presence detection devices (“PDDs”) described inthe '279 application, various video game consoles, and the like. Suchdevices generally have a reliable, and relatively high-speed, connectionto the Internet (to enable the value added services) and significantamounts of downtime, in which the processing and other capabilities ofthe devices are unused.

In the context of video communications, while some existing devicesprovide inexpensive ways for a user to engage in video calls, the entirefield of video calling (and viewing video generally) traditionally tendsto be static, in the sense that the image viewed does not change withthe position of the viewer. This is very much unlike a real-lifeexperience. For example, when a person looks through a window, what thatperson sees through the window changes depending on the person'sperspective relative to the window. If the person gets closer to thewindow, he or she has broader field of view of the scene on the otherside of the window (i.e., can see more of the area on the other side ofthe window). Conversely, if the person moves further way, he or she hasa narrower field of view. If a person moves to the right relative to thewindow, the field of view will shift toward the left, and so forth. Inconventional video communications (including, without limitation, videocalling as well as other video communications, such as television andvideo gaming), the fact that the image does not change with position ofthe viewer makes the interaction feel less lifelike and less real.

Hence, there is a need for solutions that allow for more flexible androbust display and apparent view functionalities based on presence andposition information of a user, and some such solutions can employ thepowerful user devices already resident in many users' homes.

BRIEF SUMMARY

A set of embodiments provides tools and techniques to enable morelifelike audio and video communications (including, without limitation,audio/video calls, video games, media content, etc.), in which theimages seen on a display device and/or the audio played through one ormore speakers changes based on the position of the viewer relative tothe display device/speakers. In one aspect, certain embodiments canprovide this functionality by being aware of the position or location ofthe viewer (or the viewer's eyes) via various means and adjusting theimage (and/or audio) that is presented to the viewer in response to thatposition.

In some embodiments, novel tools and techniques might be provided forsensing the presence and/or position of a user in a room, and/or forcustomizing displayed content (including video call content, mediacontent, and/or the like) based on the sensed presence and/or positionof the user. In particular, in some aspects, a user device (which mightinclude, without limitation, a video calling device, an image capturedevice, a gaming console, etc.) might determine a position of a userrelative to a display device in communication with the user device. Theuser device and/or a control server (in communication with the userdevice over a network) might adjust an apparent view of video orimage(s) displayed on the display device, based at least in part on thedetermined position of the user relative to the display device.

In some cases, adjusting an apparent view of the video or image(s) mightcomprise one or more of adjusting an apparent field of view of the videoor image(s) and/or adjusting an apparent perspective of the video orimage(s). In some instances, the video or image(s) displayed on thedisplay device might comprise one of a video program, a televisionprogram, movie content, video media content, audio media content, gamecontent, or image content, and/or the like.

The techniques described herein can also be employed in a variety ofvideo calling environments, and with a variety of different hardware andsoftware configurations. Merely by way of example, these techniques canbe used with video calling devices and systems described in detail inU.S. patent application Ser. No. 12/561,165, filed Sep. 16, 2009 byShoemake et al. and titled “Real Time Video Communications System”(issued as U.S. Pat. No. 8,144,182) and in the '304, '360, '279, '928,'903, '133, '603, '435, '009, '518, '499, '621, '263, and '506applications, each of which is incorporated by reference, as if setforth in full in this document, for all purposes.

The tools provided by various embodiments include, without limitation,methods, systems, and/or software products. Merely by way of example, amethod might comprise one or more procedures, any or all of which areexecuted by an image capture device (“ICD”), a presence detection device(“PDD”), and/or a computer system. Correspondingly, an embodiment mightprovide an ICD, a PDD, and/or a computer system configured withinstructions to perform one or more procedures in accordance withmethods provided by various other embodiments. Similarly, a computerprogram might comprise a set of instructions that are executable by anICD, a PDD, and/or a computer system (and/or a processor therein) toperform such operations. In many cases, such software programs areencoded on physical, tangible, and/or non-transitory computer readablemedia (such as, to name but a few examples, optical media, magneticmedia, and/or the like).

In an aspect, a method might comprise determining, with a user devicecomprising a camera, a position of a user relative to a display devicein communication with the user device. The method might further compriseadjusting an apparent view of video on the display device in response tothe determined position of the user relative to the display device.

According to some embodiments, adjusting an apparent view of video onthe display device might comprise adjusting an apparent field of view ofthe video to correspond to the determined position of the user relativeto the display device. In some cases, adjusting an apparent view ofvideo on the display device might comprise adjusting an apparentperspective of the video to correspond to the determined position of theuser relative to the display device.

In some embodiments, the user device might comprise a video callingdevice, and wherein the video on the display device might comprise avideo call. In some instances, the user device might comprise a videogame console, and wherein the video on the display device might comprisea video game. According to some embodiments, the video on the displaydevice might comprise one of a video program, a television program,movie content, video media content, audio media content, game content,or image content. In some cases, the video on the display device mightcomprise a live video stream captured by a camera in a location remotefrom the user device. Merely by way of example, in some instances, themethod might further comprise adjusting an audio track of the video inresponse to the determined position of the user relative to the displaydevice.

In another aspect, a user device might comprise a sensor, a processor,and a computer readable medium having encoded thereon a set ofinstructions executable by the processor to cause the user device toperform one or more operations. The set of instructions might compriseinstructions for determining a position of a user relative to a displaydevice in communication with the user device and instructions foradjusting an apparent view of video on the display device in response tothe determined position of the user relative to the display device.According to some embodiments, the user device might comprise thedisplay device.

In yet another aspect, a method might comprise determining, with a videocalling device, a position of a first party to a video call relative toa display device that displays video of a video call. The method mightfurther comprise adjusting an apparent view of the video call, based atleast in part on the determined position of the first party to the videocall.

In some embodiments, the video calling device might comprise a videoinput interface to receive video input from a set-top box, an audioinput interface to receive audio input from the set-top box, a videooutput interface to provide video output to the display device, an audiooutput interface to provide audio output to an audio receiver, a videocapture device to capture video, an audio capture device to captureaudio, a network interface, at least one processor, and a storage mediumin communication with the at least one processor. The storage mediummight have encoded thereon a set of instructions executable by the atleast one processor to control operation of the video calling device.The set of instructions might comprise instructions for controlling thevideo capture device to capture a captured video stream, instructionsfor controlling the audio capture device to capture a captured audiostream, instructions for encoding the captured video stream and thecaptured audio stream to produce a series of data packets, andinstructions for transmitting the series of data packets on the networkinterface for reception by a second video calling device.

In some cases, adjusting an apparent view of the video call mightcomprise adjusting an apparent field of view of the video call. In someinstances, determining a position of a first party might comprisedetermining a distance of the first party from the display device.According to some embodiments, adjusting an apparent field of view ofthe video might comprise zooming the video based on the determineddistance of the first party from the display device. In someembodiments, determining a position of a first party might comprisesdetermining a horizontal position of the first party in a horizontaldimension of a plane parallel to a face of the display device. In someinstances, adjusting an apparent field of view of the video mightcomprise panning the video in a horizontal direction, based on thedetermined horizontal position of the first party. According to someembodiments, determining a position of a first party might comprisedetermining a vertical position of the first party in a verticaldimension of a plane parallel to a face of the display device. In somecases, adjusting an apparent field of view of the video might comprisepanning the video in a vertical direction, based on the determinedvertical position of the first party.

According to some embodiments, adjusting an apparent view of the videocall might comprise modifying, at the video calling device, a videosignal received by the video calling device. In some cases, the videomight be received from a second video calling device. Adjusting anapparent view of the video call might comprise instructing the secondvideo calling device to adjust a view of one or more cameras of thesecond video calling device. In some instances, instructing the secondvideo calling device to adjust a view of one or more cameras mightcomprise instructing the second video calling device to adjust a fieldof view of the one or more cameras. In some embodiments, the secondvideo calling device might comprise an array of cameras. The field ofview of the one or more cameras might comprise a field of view of acomposite image captured by a plurality of cameras within the array ofcameras. The apparent view of the video call might comprise a virtualperspective of the composite image. The virtual perspective mightrepresent a perspective of the first party to the video call relative tothe display device.

In some embodiments, instructing the second video calling device toadjust a view of one or more cameras might comprise instructing thesecond video calling device to adjust a perspective of the one or morecameras. In some cases, instructing the second video calling device toadjust a view of one or more cameras might comprise instructing thesecond video calling device to pan a camera in at least one of ahorizontal dimension or a vertical dimension. According to someembodiments, instructing the second video calling device to adjust aview of a camera might comprise instructing the second video callingdevice to zoom a camera. In some instances, instructing the second videocalling device to adjust a view of a camera might comprise instructingthe second video calling device to crop frames of a video streamcaptured by the camera.

In some cases, the method might further comprise determining, with thevideo calling device, that the first party has moved relative to thedisplay device, and modifying the apparent view of the video call, inresponse to determined movement of the first party. In some embodiments,modifying the apparent view of the video call might comprise modifyingan apparent perspective of the video call, in response to determinedmovement of the first party. In some instances, modifying the apparentview of the video call might comprise modifying the apparent view of thevideo call substantially in real time with the determined movement ofthe first party.

According to some embodiments, the video calling device might comprise acamera, and determining a position of a first party to a video callmight comprise capturing one or more images of the first party with thecamera. In some cases, the one or more images might comprise a videostream. The method, in some instances, might further comprisetransmitting the video stream to a second video calling device as partof the video call. In some instances, determining a position of a firstparty to a video call might further comprise analyzing the one or moreimages to identify the position of the first party. In some embodiments,analyzing the one or more images might comprise identifying, in the oneor more images, positions of one or more eyes of the first party to thevideo call.

In still another aspect, an apparatus might comprise a computer readablemedium having encoded thereon a set of instructions executable by one ormore computers to cause the apparatus to perform one or more operations.The set of instructions might comprise instructions for determining aposition of a first party to a video call relative to a display devicethat displays video of a second party to the video call, andinstructions for adjusting an apparent view of the video of the secondparty to the video call, based at least in part on the determinedposition of the first party to the video call.

In another aspect, a system might comprise a video calling device and acomputer. The video calling device might comprise at least one firstprocessor and a first computer readable medium in communication with theat least one first processor. The first computer readable medium mighthave encoded thereon a first set of instructions executable by the atleast one first processor to cause the video calling device to performone or more operations. The first set of instructions might compriseinstructions for determining a position of a first party to a video callrelative to a display device that displays video of a second party tothe video call. The computer might comprise one or more secondprocessors and a second computer readable medium in communication withthe one or more second processors. The second computer readable mediummight have encoded thereon a second set of instructions executable bythe one or more second processors to cause the computer to perform oneor more operations. The second set of instructions might compriseinstructions for adjusting an apparent view of the video of the secondparty to the video call, based at least in part on the determinedposition of the first party to the video call.

According to some embodiments, the video calling device might comprisethe computer. In some embodiments, the video calling device mightcomprise a first video calling device. The system might further comprisea second video calling device that comprises a camera that records thevideo of the second party to the video call. In some cases, theinstructions for adjusting an apparent field of view of the video of thesecond party to the video call might comprise transmitting, to thesecond video calling device, instructions for adjusting a field of viewof the camera of the second video calling device. In some instances, thecomputer might be a control server separate from the video callingdevice. The computer, according to some embodiments, might beincorporated within a second video calling device that further comprisesa camera that captures the video of the second party to the video call.

In some cases, the video calling device might comprise a video inputinterface to receive video input from a set-top box, an audio inputinterface to receive audio input from the set-top box, a video outputinterface to provide video output to a display device, an audio outputinterface to provide audio output to an audio receiver, a video capturedevice to capture video, an audio capture device to capture audio, anetwork interface, one or more third processors, and a third storagemedium in communication with the one or more third processors. The thirdstorage medium might have encoded thereon a third set of instructionsexecutable by the one or more third processors to control operation ofthe video calling device. The third set of instructions compriseinstructions for controlling the video capture device to capture acaptured video stream, instructions for controlling the audio capturedevice to capture a captured audio stream, instructions for encoding thecaptured video stream and the captured audio stream to produce a seriesof data packets, and instructions for transmitting the series of datapackets on the network interface for reception by a second video callingdevice.

Various modifications and additions can be made to the embodimentsdiscussed without departing from the scope of the invention. Forexample, while the embodiments described above refer to particularfeatures, the scope of this invention also includes embodiments havingdifferent combination of features and embodiments that do not includeall of the above described features.

BRIEF DESCRIPTION OF THE DRAWINGS

A further understanding of the nature and advantages of particularembodiments may be realized by reference to the remaining portions ofthe specification and the drawings, in which like reference numerals areused to refer to similar components. In some instances, a sub-label isassociated with a reference numeral to denote one of multiple similarcomponents. When reference is made to a reference numeral withoutspecification to an existing sub-label, it is intended to refer to allsuch multiple similar components.

FIG. 1 is a block diagram illustrating a system for modifying anapparent view(s) of displayed content, based at least in part on sensedpresence and/or determined position(s) of a user in a room, inaccordance with various embodiments.

FIGS. 2 and 3 illustrate fields of view, in accordance with variousembodiments.

FIGS. 4A-4F are general schematic diagrams illustrating techniques foradjusting an apparent field of view of a display device, in accordancewith various embodiments.

FIGS. 5A and 5B are general schematic diagrams illustrating techniquesfor adjusting apparent fields of view of a display device for multipleusers, in accordance with various embodiments.

FIG. 6 is a general schematic diagram illustrating a windowed field ofview in relation to a sensor field of view, in accordance with variousembodiments.

FIGS. 7A and 7B are general schematic diagrams illustrating a displaydevice in use with one or more image capture devices, in accordance withvarious embodiments.

FIG. 8 is a block diagram illustrating another system for modifying anapparent view(s) of displayed content, based at least in part on sensedpresence and/or determined position(s) of a user in a room, inaccordance with various embodiments.

FIG. 9 is a process flow diagram illustrating a method of providing avirtual window or for modifying an apparent view(s) of displayedcontent, based at least in part on sensed presence and/or determinedposition(s) of a user in a room, in accordance with various embodiments.

FIG. 10 is a generalized schematic diagram illustrating a computersystem, in accordance with various embodiments.

FIG. 11 is a block diagram illustrating a networked system of computers,which can be used in accordance with various embodiments.

DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS

While various aspects and features of certain embodiments have beensummarized above, the following detailed description illustrates a fewexemplary embodiments in further detail to enable one of skill in theart to practice such embodiments. The described examples are providedfor illustrative purposes and are not intended to limit the scope of theinvention.

In the following description, for the purposes of explanation, numerousspecific details are set forth in order to provide a thoroughunderstanding of the described embodiments. It will be apparent to oneskilled in the art, however, that other embodiments of the presentinvention may be practiced without some of these specific details. Inother instances, certain structures and devices are shown in blockdiagram form. Several embodiments are described herein, and whilevarious features are ascribed to different embodiments, it should beappreciated that the features described with respect to one embodimentmay be incorporated with other embodiments as well. By the same token,however, no single feature or features of any described embodimentshould be considered essential to every embodiment of the invention, asother embodiments of the invention may omit such features.

Unless otherwise indicated, all numbers used herein to expressquantities, dimensions, and so forth used should be understood as beingmodified in all instances by the term “about.” In this application, theuse of the singular includes the plural unless specifically statedotherwise, and use of the terms “and” and “or” means “and/or” unlessotherwise indicated. Moreover, the use of the term “including,” as wellas other forms, such as “includes” and “included,” should be considerednon-exclusive. Also, terms such as “element” or “component” encompassboth elements and components comprising one unit and elements andcomponents that comprise more than one unit, unless specifically statedotherwise.

Features Provided by Various Embodiments

Presence Detection Functionalities

Presence Detection Devices (“PDDs”) or Image Capture Devices (“ICDs”)provided by various embodiments can contain or communicate with, interalia, cameras, microphones, and/or other sensors (including, withoutlimitation, infrared (“IR”) sensors). These sensors, in conjunction withthe internal processing capability of the device, can allow the deviceto detect when a person is in the room. Additionally, through means suchas facial recognition and voice detection, or the like, the devices alsocan automatically recognize who is in the room. More specifically, suchdevices can detect the presence of a particular individual. In someaspects, ICDs might contain or communicate with, inter alia, imagecapture devices for capturing images or video of the person or people inthe room. In some cases, ICDs might also contain or communicate with,inter alia, microphones, and/or other sensors (including, withoutlimitation, infra-red (“IR”) sensors). According to some embodiments,some ICDs might have similar functionality as PDDs.

In various embodiments, presence detection can be local and/or cloudbased. In the case of local presence detection, the PDD or ICD itselfmight keep a list of all user profiles and will attempt to match anindividual against its local list of all users. In cloud baseddetection, the functionality of user detection can be moved into serversin the cloud. A cloud based approach allows detection of a user'spresence to be mobile among various devices (whether or not owned by,and/or associated with, the user). That same user can be detected on hisor her device or on any other device that has the same capability andthat is tied into the same cloud infrastructure.

The ability to automatically detect the presence of an individual on anydevice presents a powerful new paradigm for many applications includingautomation, customization, content delivery, gaming, video calling,advertising, and others. Advantageously, in some embodiments, a user'scontent, services, games, profiles (e.g., contacts list(s), social mediafriends, viewing/listening/gaming patterns or history, etc.), videomail,e-mail, content recommendations, determined advertisements, preferencesfor advertisements, and/or preferences (e.g., content preferences,content recommendation preferences, notification preferences, and/or thelike), etc. can follow that user from device to device, includingdevices that are not owned by (or previously associated with) theindividual, as described in detail in the '279 application (alreadyincorporated herein). Alternatively, or in addition, presence detectionfunctionality can also allow for mobile presence detection that enablesremote access and control of ICDs over a network, following automaticidentification and authentication of the user by any device (e.g., PDD,ICD, or other device) so long as such device has authenticationfunctionality that is or can be tied to the access and control of theICDs, regardless of whether or not such device is owned or associatedwith the user. In other words, the ability to remotely access andcontrol one's ICDs over a network can follow the user wherever he or shegoes, in a similar manner to the user's content and profiles followingthe user as described in the '279 application. Such remote control ofICDs, as well as post-proces sing of video and/or image data captured bythe ICDs, is described in detail in the '263 application (which isalready incorporated by reference herein).

Various sensors on a PDD or an ICD (and/or a video calling device) canbe used for user detection. Facial recognition can be used to identify aparticular individual's facial characteristics, and/or voice detectioncan be used to uniquely identify a person. Additionally, PDDs, ICDs,and/or video calling devices may also have local data storage. Thislocal data storage can be used to store a database of user profiles. Theuser profiles can contain the various mechanisms that can be used toidentify a person, including username and password, facialcharacteristics, voice characteristics, etc. When sensors detect thefacial features or capture the voice of a particular individual, thatcaptured presence information can be compared against thecharacteristics of the users on the local storage. If a match is found,then the individual has been successfully identified by the device. (Asused herein, the term “presence information” can be any data orinformation that can be used to determine the presence of a user, and/orto identify and/or authenticate such a user. As such, presenceinformation can include raw image, video, or audio data, analyzed data(e.g., video or image data to which preliminary facial recognitionprocedures, such as feature extraction, have been employed, as well asverification of audio self-identification or verification of audiochallenge/response information), the results of such analysis, and eventhe end result of the detection process—i.e., a notification that a useris present and/or an identification of the user.)

Detection of a user's presence can also be performed via proximity of aPDD, an ICD, and/or a video calling device to another device. Forexample, if a user's mobile phone, smart phone, tablet, or PC is nearthe PDD, the ICD, and/or the video calling device, that person isautomatically detected. In some instances, a unique device identifierfor each of a user's devices might have previously been associated withthe user's profile in a cloud database or the like (i.e., making theuser's devices “known devices”), and detection of such unique deviceidentifiers might serve as a basis for identifying the user, or mightstreamline the identification process by verifying whether the personwith the device owned by or associated with the known device is the useror simply someone in possession of the device(s) (whether lawful orunlawful). Such verification might comprise one or more of facialrecognition, voice recognition, audio challenge/response verification,biometric analysis, or the like. In some cases, audio challenge/responseverification might include analysis of sub-vocal responses from theperson challenged, to prevent undesired casual overhearing of audiopasswords, audio keyphrases, or the like. In some instances, biometricanalysis might include analysis of any suitable biometric (aside fromfacial and voice recognition) selected from a group consisting offingerprint, iris, pupil, height, unique scar(s), other unique physicalcharacteristics, and/or any combination of these biometrics. To capturebiometric information such as fingerprints, iris, pupil, height, scar,or other unique physical characteristics, which might be image-basedbiometrics (which might be captured by a high resolution image capturedevice of the PDD, the ICD, and/or the video calling device), the PDD,the ICD, and/or the video calling device might prompt the person beingdetected to position himself or herself so that his or her fingerprints,iris, pupil, full body, scar, or other unique physical characteristics,respectively, are appropriately facing the image capture device of thePDD and/or the ICD.

In some embodiments, with detection of known devices and with automaticdetection/identification processes being enabled, it may be possible forthe system to identify persons not normally associated with a knowndevice being in possession of the known device. In such a case, thesystem might notify the original user (via e-mail or other forms ofcommunication indicated in the user's profile, or the like) of thesituation. In some instances, the user might indicate that the unknownperson does have authority or permission to use, or be in possession of,the user's device. In other cases, where the user indicates that theuser does not have authority or permission to use the device, the usermay be given options to proceed, including, without limitation, optionsto lock data, options to lock device functions, options to activatelocation tracking (including, without limitation, global positioningsystem (“GPS”), global navigation satellite system (“GNSS”), etc.) ofthe device (in case the system loses track of the device; e.g., in thecase the device moves outside the range of the system'ssensor/detection/communications systems), options to contact the unknownperson, options to activate speakers to emit sirens, options to activatedisplays or lights (e.g., light emitting diodes (“LEDs”), organic LEDs(“OLEDs”), liquid crystal displays (“LCDs”), etc.), and/or options tonotify authorities (e.g., police or other law enforcement personnel) ofthe situation and/or the location of the device (e.g., GPS coordinates,or the like), etc.

Additionally and/or alternatively, proximity detection can be done usingGNSS location tracking functionality, which can be found in manyelectronic devices and authenticating the user when the secondary deviceis within a predefined distance of the PDD, the ICD, and/or the videocalling device. Proximity detection can also be done wirelessly viaBluetooth or WiFi. With respect to Bluetooth, if the secondary devicepairs with the PDD, the ICD, and/or the video calling device, the usercan be considered detected. With respect to WiFi, one approach could beto see if the secondary device associates with the same WiFi accesspoint to which the PDD, the ICD, and/or the video calling device isconnected. Another approach to proximity detection is the use ofnear-field communications (“NFC”) commonly found in many electronicdevices. When the secondary device is within range of the PDD, the ICD,and/or the video calling device, a NFC detector can be used to determinethat the user is in the room. From these examples, a skilled readershould appreciate that many different techniques can be used to detectpresence based on device proximity.

According to some embodiments, regardless of the specific manner inwhich the user's electronic device, personal device, or user device isdetected, presence may be determined or inferred by knowing the locationof the personal device (which might include, without limitation, atleast one of a laptop computer, a smart phone, a mobile phone, aportable gaming device, a desktop computer, a television, a set-top box,or a wearable computing device, and/or the like). When the personaldevice is close to the display device (or the PDD, ICD, and/or videocalling device), it may be determined that the personal device (andhence the user associated with the personal device) is present. Based onthe presence of the user and information about the user, advertisementcontent (which may be determined to be relevant to the user) may be sentto the display device. In this manner, a highly targeted advertising maybe implemented (which may be embodied, in some cases, as a highlytargeted form of television advertisement, which may be thought of asbeing similar to what is done on web browsers today, but much moretargeted). In a similar manner, recommendations of media content and/or(in some cases, automatic) presentation of recommended media content mayalso be based on the presence of the user and information about theuser. From the user's perspective, when he or she is in the room,recommended media content and/or advertisements on the display device(e.g., a TV or the like) may become customized to him or her (based ondetection of the presence of the user and/or based on detection of thepresence of his or her personal device, and, in some cases, based alsoon the user's profile, other information about the user, and/or thelike). In some embodiments, the PDD/ICD/video calling device may be oneof the personal device itself, a computer/server in the cloud, and/orthe personal device in conjunction with some computer/server in thecloud, or the like. The recommended media content and/or advertisementmay be sent to a local content source (e.g., an STB or the like) oranother PDD/ICD/video calling device that has the ability to controlcontent being played or sent to the display device (and/or, of course,to receive the recommended media content and/or advertisement from acontent server). Such a method or apparatus may allow for the targetedpresentation (or, in some cases, selling) of recommended media contentand/or advertisements directly to the display device (e.g., TV or thelike), based on characteristics of the user. In some cases, among otherinformation about the user that can be taken into account, determinationof recommended media content and/or advertisements to send to thedisplay device might be based on, or might otherwise take into account,the user's Internet browsing history, the user's Internet browsingpatterns, the user's Internet browser bookmarks/favorites, and/or thelike.

In some embodiments, detection of an individual can be fully automaticand might (in some instances) require no user interaction. For example,the system can characterize an individual's facial features (and/orunique physical characteristics or other biometrics) automatically,detect the presence of a secondary device, characterize an individual'svoice print automatically, etc. Several detection methods can be used incombination to reduce errors in the detection process. For example, ifthe system detects a person in the room and first identifies thatperson's facial features, it can then prompt them for voice (e.g., “Bob,is that you?”). Once the user's voice is captured, that audio sample canbe compared against the stored voice characteristics for that user, toreduce false detection. Another approach for the second step may be toprompt the user to speak a PIN or password to be compared against whatis stored in the user profile. Using this approach, the characteristicsof the speech (e.g., user's voice, cadence, syntax, diction) and thecontent of the speech (e.g., a PIN or password) can be jointly used toreduce false detections. To prevent eavesdropping of passwords or PINs,the audio capture device might be configured to capturesub-vocalizations of the passwords or PINs, for analysis. Alternativelyand/or additionally, the system can prompt the user to position his orher body so as to allow the image capture device to face one or more ofthe user's fingers (e.g., for fingerprint analysis), the user's eyes(e.g., for iris and/or pupil analysis), the user's full body (e.g., forheight analysis), portions of the user's body (e.g., for analysis ofscars or other unique physical characteristics, or the like), etc.

In some embodiments, physical geography can be used as a metric indetection to reduce the possibility of errors. For example, if a user isknown to use the system in Dallas, Tex., and then is detected in Madrid,Spain, the system can weigh detection in Spain lower than detection inDallas. Additionally, if the user is detected in Spain, a secondaryauthentication method may optionally be invoked to reduce falsedetection. According to some embodiments, in the case that the systemhas access to profile or other personal information of the user such ascommunications, calendar items, contacts list, travel/itineraryinformation, or the like that might indicate that the user might bevisiting a friend or relative in Spain having a similar PDD, ICD, and/orvideo calling device linked to a common network or cloud server, thesystem might determine that the user is or will be in Spain. In such acase, the user's profiles, media content, preferences, contentrecommendations, determined advertisements, preferences foradvertisements, or the like (or access thereto) might be sent to thefriend's or relative's device in Spain or to a local data center or thelike to allow the user to access the user's own content or profiles onthe friend's or relative's device during the visit; in particularembodiments, the user's profiles might include access and controlinformation for remotely accessing and controlling the user's ICDs overa network, while the user's content might include image data and/orvideo data captured by the user's ICDs (either in raw or processedform). After the scheduled visit, it may be determined using anycombination of the user's personal information, the user's devices(including the user's PDD, ICD, and/or video calling device, mobiledevices, etc.), and/or the friend's or relative's device whether theuser has left the friend's or relative's location (in this example,Spain). If so determined, the content and profiles (or access thereto,as the case may be) might be removed from the friend's or relative'sdevice (and/or from the data center or the like that is local to saiddevice).

In particular embodiments, a PDD, an ICD, and/or a video calling devicecan also be connected to a network, such as the Internet. In such ascenario, the database of user profiles, including identifiable facialand/or voice characteristics, as well as other identifying information(e.g., passwords, identifying information for other devices owned by theuser, etc.), can be stored on servers located in the cloud, i.e., on thenetwork or in a distributed computing system available over the network.In some cases, the distributed computing system might comprise aplurality of PDDs, a plurality of ICDs, and/or a plurality of videocalling devices in communication with each other either directly orindirectly over the network. The distributed computing system, in someinstances, might comprise one or more central cloud servers linking theplurality of PDDs, the plurality of ICDs, and/or the plurality of videocalling devices and controlling the distribution and redundant storageof media content, access to content, user profiles, user data, contentrecommendations, determined advertisements, preferences foradvertisements, and/or the like. When an individual's facial featuresare detected by a PDD, an ICD, and/or a video calling device, thosefeatures (and/or an image captured by the PDD, the ICD, and/or the videocalling device) can be sent to a server on the network. The server thencan compare the identifiable facial features against the database ofuser profiles. If a match is found, then the server might inform thedevice of the identity of the user and/or might send a user profile forthe user to the device.

User profiles, including facial characteristics, can be stored bothlocally on the device and on a server located in the cloud. When usingboth device-based and cloud-based databases, user identification can beperformed by first checking the local database to see if there is amatch, and if there is no local match, then checking the cloud-baseddatabase. The advantage of this approach is that it is faster for useridentification in the case where the user profile is contained in thelocal database. In some embodiments, the database on the device can beconfigured to stay synchronized with the database in the cloud. Forexample, if a change is made to a user profile on the device, thatchange can be sent to the server and reflected on the database in thecloud. Similarly, if a change is made to the user profile in thecloud-based database, that change can be reflected on the devicedatabase.

Matching presence information or identifying information with anindividual having a user profile can be a form of authentication in someembodiments. User profiles can also contain information necessary formany authentication mechanisms. Such information may includechallenge/response pairs (such as username and password combinations,security question/pass phrase combinations, or the like), facialrecognition profiles, voice recognition profiles, and/or other biometricinformation, such as fingerprints, etc. An individual may beauthenticated using any combination of such techniques.

In some cases, the system can also determine when a user is no longerpresent. Merely by way of example, a PDD, an ICD, and/or a video callingdevice might continually (or periodically) monitor for the user'spresence. For instance, in the case of facial recognition, the devicecan continually check to detect whether a captured image includes theuser's face. With voice recognition, after a period of inactivity, thedevice might prompt the user if they are there (e.g., “Bob, are youstill there?”).

According to some embodiments, user profiles can work acrossheterogeneous networks. Not all user devices need to be the same. Someuser devices might be PDDs, ICDs, and/or video calling devices. Otheruser devices might be computers, tablets, smart phones, mobile phones,etc. Each device can use any appropriate method (based on devicecapabilities) to determine the presence of, identify, and/orauthenticate the user of the device with a user profile.

In an aspect, this automated presence detection can be used to provideuser information (e.g., content, content recommendations, determinedadvertisements, preferences for advertisements, and/or services) to anidentified user. With a PDD, an ICD, and/or a video calling device, whena user enters the room, and the camera sensors detect that user's facialfeatures (or other biometric features) and authenticates the individual,the content associated with that user profile (including, withoutlimitation, profile information for handling media content, for handlingcontent recommendations, for handling notification of contentrecommendations, for handling determination of advertisements, forhandling presentation of advertisements, and/or the like) canautomatically become available to that individual. Additionally, withthe cloud-based authentication approach described herein, that user'scontent, content recommendations, determined advertisements, preferencesfor advertisements, and/or profiles can become available on any device.More specifically, if a user is identified by another PDD, ICD, and/orvideo calling device, then his or her content (e.g., media content,and/or the like), content recommendations, determined advertisements,preferences for advertisements, profiles, etc., become available to himor her even if the PDD, ICD, and/or video calling device that he or sheis in front of is not the user's own device. This functionality allows anew paradigm in which the user's content, content recommendations,determined advertisements, preferences for advertisements, and/orprofiles follow the user automatically. Similarly, when upgrading PDDs,ICDs, and/or video calling devices, detection, identification, andauthentication of the user on the new device can allow automatic andeasy porting of the user's content, content recommendations, determinedadvertisements, preferences for advertisements, and/or profiles to thenew device, allowing for an ultimate type of “plug-and-play”functionality, especially if the profiles include information onconfigurations and settings of the user devices (and interconnectionswith other devices).

PDDs, ICDs, and/or video calling devices also are capable of handling,transmitting, and/or distributing image captured content, which caninclude, but is not limited to, video mail and/or video mail datacaptured or recorded by the video calling devices. In some cases, thevideo mail and/or video mail data might be raw data, while in othercases they might be post-processed data. Video mail and/or video maildata can be stored on servers in the cloud, on PDDs, ICDs, and/or videocalling devices in the cloud, and/or locally on a particular userdevice. When accessing video mail and/or video mail data from anotherdevice, the first PDD and/or video calling device that has the videomail and/or video mail data stored thereon needs to serve the video mailand/or video mail data to the new device that the user is using. Inorder to do this, the new PDD, ICD, and/or video calling device mightneed to get a list of video mail and/or video mail data that is storedon the first PDD and/or video calling device. This can, in someembodiments, be facilitated via a server that is in the cloud that allPDDs, ICDs, and/or video calling devices are always or mostly connectedto. The server can communicate with all PDDs, ICDs, and/or video callingdevices and help send messages between PDDs, ICDs, and/or video callingdevices. When a user is authenticated with a new PDD, ICD, and/or videocalling device, the new device can request the list of video mail and/orvideo mail data from the first device. If the user requests video mailand/or video mail data from the new device, then the first PDD, ICD,and/or video calling device (or the other user device) can serve thevideo mail and/or video mail data to the new device. This can be doneeither directly in a peer-to-peer fashion and/or can be facilitated bythe server. For instance, in some cases, peer-to-peer sessions might beinitiated using a server, and after a peer-to-peer session has beeninitiated or established by the server, the server may be by-passed,resulting in a direct peer-to-peer connection or session. In someembodiments, this communication can be accomplished by using protocolssuch as XMPP, SIP, TCP/IP, RTP, UDP, etc. Videomail capture, processing,and distribution is described in detail in the '499 application, whichis already incorporated herein by reference.

As discussed above, identification and authentication of a user by aPDD, an ICD, and/or a video calling device (whether or not associatedwith or owned by the user) can provide the user with remote access andcontrol of the user's PDD(s), ICD(s), and/or video calling device(s)over a network (e.g., by porting the user's profiles associated withremote access and control of the user's device(s), and/or the like tothe current PDD, ICD, and/or video calling device in front of which theuser is located). This functionality allows the user to remotely accessmedia content, to remotely access and modify settings for contentrecommendations, to remotely access and modify settings foradvertisements, and to remotely access and modify user profiles, and/orthe like.

Master Account

Some embodiments employ a master account for access to a video callingdevice. In an aspect, a master account can be created on a per userbasis. This master account might serve as the top-level identifier for aparticular user. In some cases, the master account may be used tomanage, control, and monitor a user's camera(s) and/or other devicefunctionalities (whether hardware and/or software-based). Additionally,the master account can be used to control any account or device levelservices that are available.

For example, an email account and password can be used as a masteraccount to manage a user's settings for accessing media content, foraccessing and modifying settings for content recommendations, foraccessing and modifying settings for advertisements, and for accessingand modifying user profiles, and/or the like.

Device Association

For proper management and control of a PDD, ICD, and/or video callingdevice, some embodiments provide the ability to reliably associate aPDD, ICD, and/or video calling device with a master account (i.e.,assign the device to the master account). When a PDD, ICD, and/or videocalling device is associated with an account, then it can be managed andcontrolled from within the master account. Association ensures that aPDD, ICD, and/or video calling device is being controlled by theappropriate user and not an unauthorized user.

A PDD, ICD, and/or video calling device may be associated with aparticular master account at the time of the device setup. During devicesetup, the user is prompted to enter a master account and password. Whendoing so, a secure communications channel may be opened up between videocalling device and servers. Then, a unique and difficult to guess keycan be sent from the device to the server. Servers that have a masterlist of all keys then can associate that particular device, via itsserial number, to a particular master account. A feature of thisapproach is that a user only needs to enter a password at the time ofdevice setup. The user never needs to enter a password again, and infact, passwords do not need to be stored on the device at all, makingthem very secure.

Device Management and Remote Configuration

Once a device has been associated with a master account, it may bemanaged from the master account via an interface such as a webinterface, in accordance with some embodiments. The communication linkbetween the device and server may, in some cases, be always encryptedand authenticated. This ensures that messages between device and serverare secure and ensures that the device knows it is communicating withthe server on behalf of the appropriate master account. Once the secureand authenticated link is established, devices can connect to the serverand are able to send and receive commands.

The device and server can have a common set of command codes andresponses. Servers can send commands down to the camera(s) to enactspecific behavior. For example, the server can send remote configurationcommands. These commands can be items such as changing the deviceaddress, changing the nickname that is associated with the device,changing the avatar image associated with the device. In addition toconfiguration, the commands can be used to enact specific behavior onthe device, such as running network tests, or taking a live image(s)from the video calling device. New commands and features can be added byextending the set of command codes on the device and server.

Virtual Window Concept

A set of embodiments can provide a “virtual window” that includes anapparent view of video content (or still images) that corresponds to auser's position with respect to the display device (such as a televisionor other display device) on which the video content (or still images) isdisplayed. In some instances, the video content might include video ofparties in a video call, video of media content (e.g., movie content,television program content, gaming content, advertisement content,and/or the like), video of a live video feed, and/or the like. In somecases, embodiments can also adjust audio (which might be an audio trackof the video content or might be a standalone audio stream with noaccompanying video), using similar techniques, based on the position ofa listener with respect to a video display (or any other specifiedpoint). With respect to video, the effect of some embodiments is to makethe displayed video appear to the user as if the user is watching thevideo through a virtual window, such that the apparent view of the videochanges depending on the user's location relative to the virtual window(i.e., display device or the like), and can be modified in real-time (ornear real-time, if the user moves with respect to the display device).Thus, the term, “virtual window” is used only for purposes ofillustrating the concepts described herein and should not be consideredlimiting in any way.

The “apparent view” that can be adjusted by various embodiments caninclude an apparent field of view and/or an apparent perspective on thevideo. With regard to a scene displayed in a video (or still image), an“apparent field of view,” as used herein, means the field of view (i.e.,portion of the scene that is displayed) that the user perceives whenwatching the video (which is analogous to the field of view of a real orvirtual camera that captured the scene depicted in the video). An“apparent perspective” is the perspective (e.g., above, below, straightin front, on one side or the other, or any suitable combination of theseperspectives) from which the user perceives that he or she is viewingthe scene depicted on the video, and it is analogous to the perspectiveof the real or virtual camera that captured the scene displayed in thevideo. (The term “virtual camera” is used to convey an embodiment inwhich the displayed video is not actually live-filmed video but isgenerated, such as animated video or video from a video game; suchgenerated video has a field of view and a perspective, just aslive-recorded video, which is represented by a virtual camera.)

Herein, description of movement of a user's eyes might refer to physicalmovement of the user's eyes relative to the display device, and notmerely rotation of the user's eyes (which is merely a change in thefocus of the user's visual field of view, and, in some cases, might notaffect the displayed field of view through the virtual window). In otherwords, physically moving so as to change one's eyes along x, y, or zdirections relative to a virtual window might change the field of viewlooking through the window, but simply rotating one's eyes (withoutchanging position of one's eyes along any of the x, y, or z directionsrelative to the virtual window) might not affect the field of viewlooking through the virtual window.

Exemplary Embodiments

FIGS. 1-11 illustrate exemplary embodiments that can provide some or allof the features described above. The methods, systems, and apparatusesillustrated by FIGS. 1-11 may refer to examples of different embodimentsthat include various components and steps, which can be consideredalternatives or which can be used in conjunction with one another in thevarious embodiments. The description of the illustrated methods,systems, and apparatuses shown in FIGS. 1-11 is provided for purposes ofillustration and should not be considered to limit the scope of thedifferent embodiments.

FIG. 1 illustrates an exemplary environment that can provide some or allof the features described herein, including, but not limited to,modifying an apparent view(s) of displayed content (including, withoutlimitation, video call content, media content, and/or the like), basedat least in part on sensed presence and/or determined position(s) of auser in a room, in accordance with various embodiments. Morespecifically, FIG. 1 illustrates a functional diagram of a system 100for controlling one or more presence detection devices (“PDDs”), one ormore image capture devices (“ICDs”), and/or one or more video callingdevices (labeled user devices 105 in FIG. 1 for ease of illustration,but described herein as PDDs, ICDs, or video calling devices, each ofwhich can be considered a type of user device). The skilled readershould note that the arrangement of the components illustrated in FIG. 1is functional in nature, and that various embodiments can employ avariety of different structural architectures. Merely by way of example,one exemplary, generalized architecture for the system 100 is describedbelow with respect to FIG. 11, but any number of suitable hardwarearrangements can be employed in accordance with different embodiments.

An ICD 105, a video calling device 105, or a PDD 105 can be any devicethat is capable of communicating with a control server 110 over anetwork 115 and can provide any of a variety of types of advertisementdetermination functionality, content recommendation functionality, videocommunication functionality, presence detection functionality, and/orthe like. Merely by way of example, in some aspects, an ICD 105, a videocalling device 105, or a PDD 105 can be capable of providing passthrough video/audio to a display device (and/or audio playback device)from another source (such as a local content source), and/or overlayingsuch video/audio with additional content generated or received by theICD 105, the video calling device 105, or the PDD 105. In other aspects,an ICD 105, a video calling device 105, or a PDD 105 can comprise one ormore sensors (e.g., digital still cameras, video cameras, webcams,security cameras, microphones, infrared sensors, touch sensors, and/orthe like), and/or can be capable, using data acquired by such sensors,of sensing the presence of a user, identifying a user, and/or receivinguser input from a user; further, an ICD 105, a video calling device 105,or a PDD 105 can be capable of performing some or all of the otherfunctions described herein and/or in any of the Related Applications.Hence, in various embodiments, an ICD 105, a video calling device 105,or a PDD 105 can be embodied by a video calling device, such as any ofthe video communication devices (“VCDs”) described in the '182 patent, avideo game console, a streaming media player, to name a few non-limitingexamples.

In one aspect of certain embodiments, as described more fully withrespect to FIG. 8 below (or as described in the Related Applications),an ICD 105, a video calling device 105, or a PDD 105 can be placedfunctionally inline between a local content source and a display device.A local content source can be any device that provides an audio or videostream to a display device and thus can include, without limitation, acable or satellite set-top box (“STB”), an Internet Protocol television(“IPTV”) STB, devices that generate video and/or audio, and/or acquirevideo and/or audio from other sources, such as the Internet, and providethat video/audio to a display device; hence, a local content source caninclude devices such as a video game console, a Roku® streaming mediaplayer, an AppleTV®, and/or the like. When situated functionally inlinebetween a local content source and a display device, the ICD, the videocalling device, or the PDD can receive an audiovisual stream output fromthe local content source, modify that audiovisual stream in accordancewith the methods described herein, in the '182 patent, and/or in the'279 application, and provide the (perhaps modified) audiovisual streamas input to the display device. It should be noted, however, that, insome cases, the functionality of a local content source can beincorporated within an ICD, a video calling device, or a PDD, and/or thefunctionality of an ICD, a video calling device, or a PDD can beincorporated within a local content source; further, it should beappreciated that an ICD, a video calling device, or a PDD (which mightor might not include local content source functionality) can be disposedinline with one or more other local content sources or one or more othervideo calling devices/PDDs. Hence, for example, an ICD, a video callingdevice, or a PDD with some local content source functionality (such as avideo game console) might be disposed inline between one or more otherlocal content sources or one or more other ICDs/video callingdevices/PDDs (such as a cable STB, satellite STB, IPTV STB, and/or astreaming media player) and a display device.

In an aspect of some embodiments, the system can include a softwareclient that can be installed on a computing device (e.g., a laptopcomputer, wireless phone, tablet computer, etc.) that has a built-incamera and/or has a camera attached (e.g., a USB webcam). This clientcan act as an interface to allow remote control of the built-in and/orattached camera on the computing device. In some embodiments, thecomputing device might have a built-in microphone(s) and/or has amicrophone(s) attached (e.g., a table-top microphone, a wall-mountedmicrophone, and/or a microphone removably mountable on a television, onthe ICD, on the video calling device, on the PDD, and/or on some othersuitable user device, or the like). The software client canalternatively and/or additionally act as an interface to allow remotecontrol of the built-in and/or attached microphone on the computingdevice. In some cases, the camera and/or microphone can be automaticallyor autonomously controlled to obtain optimal video and/or audio input.Remote control of the video calling device and/or PDD is described indetail in the '263 application (already incorporated herein), and may besimilarly applicable to remote control of the ICD.

The system 100 can further include a control server 110, which can haveany suitable hardware configuration, and an example of one suchconfiguration is described below in relation to FIG. 11. In one aspect,the control server 110 is a computer that is capable of receiving userinput via a user interface 120 and/or performing operations forutilizing the ICD(s) 105, the video calling device(s) 105, and/or thePDD(s) 105 to perform one or more of receiving (and relaying) mediacontent (either directly from a media content server or database (bothnot shown) via network 115, indirectly via a local content source (e.g.,an STB or the like), directly from cloud storage system 130, and/or thelike), monitoring the media content presented to the user(s), monitoringthe user(s), sending the monitored data to the control server 110,determining content recommendations, determining at least oneadvertisement for the user(s) with the control server 110, receiving theat least one advertisement for the user(s) from the control server 110,presenting the at least one advertisement to the user(s), determiningposition(s) of the user(s) (and/or the user(s)'s eyes) relative to adisplay device, adjusting the apparent view of the content displayed onthe display device based at least in part on the determined position(s)of the user(s) (and/or the user(s)'s eyes) relative to the displaydevice, and/or the like. In some cases, the control server 110 mighthandle all of the processes for identifying and authenticating users andfor providing access to the user(s)'s profiles, content, information,recommendations, advertisements, preferences (including, withoutlimitation, preferences for advertisements and other user preferences,etc.), as well as handling the processes involved with determining orpresenting the advertisements, and/or handling processes involved withposition(s) determination of the user(s) (and/or eyes of the user(s))and handling modification/adjustment of the apparent view of contentdisplayed on a display device based on the determined position(s) of theuser(s) (and/or eyes of the user(s)). Alternatively, or additionally,the processes involved with position(s) determination of the user(s)(and/or eyes of the user(s)) and/or handling modification/adjustment ofthe apparent view of content displayed on a display device based on thedetermined position(s) of the user(s) (and/or eyes of the user(s)) mightbe handled by the user device 105 corresponding to the user(s) and/or tothe display device. In other instances, control server 110 and theparticular user device 105 might split the processing tasks in anysuitable manner, as appropriate

Merely by way of example, in some embodiments, the control server 110can detect user presence, identify/authenticate users, and/or enable theuser to remotely access the user's master account, user preferences,media content, recommendations of media content, advertisements,preferences for advertisements, and/or the like. In other cases, thecontrol server 110 can receive and/or store user input and/or userpreferences that can specify whether and how presence information shouldbe used, whether and how the user's ICD(s), video calling device(s),and/or PDD(s) may be used in a distributed infrastructure, whether andhow the user's content and profiles should be handled under certainsituations, and/or the like.

For example, preferences might specify which account information,content, profile information, personal communications (e.g., videomail,voicemail, e-mail, etc.), media content, media content recommendations,determined advertisements, preferences for advertisements, and/or thelike should be delivered to a user when present at a device not owned bythe user, whether presence information should be collected for that userat all (and/or where such information should be collected); for example,a user might specify that his presence should only be monitored inselected locations or from selected devices, and the control server 110might remove that user's profile from the search universe when providedwith presence information from a device not at the selected location orfrom a device other than one of the selected devices. More generally,the user preference can include any types of parameters related tocollecting presence information, using presence information, handlingmedia content recommendations, handling advertisements, and/or servingcontent/information (including, without limitation, user accountinformation, user content, user profile information, user's personalcommunications (e.g., videomail, videomail, voicemail, e-mail, etc.),media content, advertisements, and/or the). These preferences might bestored in a user profile at the control server 110, which might alsoinclude other user-specific information, such as the user's normallocation(s), identifying information (such as MAC address, etc.) ofother user devices owned by or associated with the user, lists of orlinks to content owned by the user, lists of or links to media contentrecommendations, lists of or links to preferences for handling mediacontent recommendations, lists of or links to advertisements, lists orlinks to products or services associated with advertisements, lists ofor links to preferences for handling advertisements, and/or the like.

In some embodiments, user preferences might specify how the user wouldlike his or her user devices to participate (or not) in a distributedinfrastructure arrangement. For instance, the user preferences mightinclude, without limitation, preferences indicating whether or not toallow a user device owned by the user to be used for distributedinfrastructure; preferences indicating what type of softwareapplications, customer data, media content (of other user device usersand/or subscribers of a cloud service), and/or advertisements arepermitted to be hosted on a user device owned by the user; and/orpreferences indicating amount of resources of a user device to dedicateto the distributed infrastructure; etc. In some embodiments, in additionto indicating how a user's user device may be used in distributedinfrastructure implementation, user preferences might allow a user toindicate how the user's own applications, data, and/or media content maybe hosted on other users' user devices. For example, the user might begiven the option to encrypt any and/or all personal data, any and/or allpersonal applications, any and/or all files or lists indicating whichmedia content are associated with the user, any and/or all files orlists pertaining to media content recommendations and/or preferencesthereof, and/or any and/or all files or lists pertaining toadvertisements and/or preferences thereof. Common media content (whichmight include popular media content, or any other media content) mayremain unencrypted for common usage by any number of users on any numberof user devices, subject only to any subscription, rental, or purchaserestrictions on the particular media content as associated with any userand/or any user device. On the other hand, the user's personalcommunications (including, e.g., videomail messages and/or the like),preferences for media content recommendations, pastdecisions/patterns/history with regard to media content viewed/listenedto/played by the user, preferences for advertisements, and/or the likemay be encrypted.

The control server 110 can provide a user interface (which can be usedby users of the ICDs 105, the video calling devices 105, and/or the PDDs105, and/or the like). The control server 110 might also providemachine-to-machine interfaces, such as application programminginterfaces (“APIs”), data exchange protocols, and the like, which canallow for automated communications with the video calling devices 105and/or the PDDs 105, etc. In one aspect, the control server 110 might bein communication with a web server 125 and/or might incorporate the webserver 125, which can provide the user interface, e.g., over the networkto a user computer (not shown in FIG. 1) and/or a machine-to-machineinterface. In another aspect, the control server 110 might provide suchinterfaces directly without need for a web server 125. Under eitherconfiguration, the control server 110 provides the user interface 120,as that phrase is used in this document. In some cases, some or all ofthe functionality of the control server 110 might be implemented by theICD 105, the video calling device 105, and/or the PDD 105 itself.

In an aspect, the user interface 120 allows users to interact with thecontrol server 110, and by extension, the ICDs, the video callingdevices 105, and/or the PDDs 105. A variety of user interfaces may beprovided in accordance with various embodiments, including, withoutlimitation, graphical user interfaces that display, for a user, displayfields on display screens for providing information to the user and/orreceiving user input from a user.

Merely by way of example, in some embodiments, the control server 110may be configured to communicate with a user computer (not shown inFIG. 1) via a dedicated application running on the user computer; inthis situation, the user interface 120 might be displayed by the usercomputer based on data and/or instructions provided by the controlserver 110. In this situation, providing the user interface mightcomprise providing instructions and/or data to cause the user computerto display the user interface. In other embodiments, the user interfacemay be provided from a web site, e.g., by providing a set of one or moreweb pages, which might be displayed in a web browser running on the usercomputer and/or might be served by the web server 125. As noted above,in various embodiments, the control system 110 might comprise the webserver and/or be in communication with the web server 125, such that thecontrol server 110 provides data to the web server 125 to beincorporated in web pages served by the web server 125 for receptionand/or display by a browser at the user computer.

The network 115, specific examples of which are described below withregard to FIG. 11, can be any network, wired or wireless, that iscapable of providing communication between the control server 110 andthe ICDs 105, the video calling devices 105, and/or the PDDs 105, and/orof providing communication between the control server 110 (and/or theweb server 125) and a user computer. In a specific embodiment, thenetwork 115 can comprise the Internet, and/or any Internet serviceprovider (“ISP”) access networks that provide Internet access to thecontrol server 110, the user computer, and/or the ICDs 105, the videocalling devices 105, and/or the PDDs 105.

In some embodiments, the system 100 can include a cloud storage system130, which can be used, as described in further detail below, to storeadvertisements, presence information, images, video, videomail messages,media content, media content recommendations, determined advertisements,preferences for advertisements, preference information of users, pastviewing/listening/playing patterns or decisions of users, and/or thelike that are monitored/captured, downloaded, streamed, and/or uploadedby the ICDs 105, the video calling devices 105 and/or the PDDs 105,and/or the like. In some cases, the cloud storage system 130 might be aproprietary system operated by an operator of the control server 110. Inother cases, the cloud storage system 130 might be operated by a thirdparty provider, such as one of the many providers of commerciallyavailable cloud services. In yet a further embodiment, the cloud storagesystem 130 might be implemented by using resources (e.g., compute,memory, storage network, etc.) shared by a plurality of video callingdevices, and/or by a plurality of PDDs, that are distributed amongvarious users of the system. Merely by way of example, as described infurther detail below and in the '360 application (already incorporatedby reference herein), a plurality of user video calling devices and/orPDDs might each have some dedicated resources (such as a storagepartition), which are dedicated for use by the system, and/or some adhoc resources (such as network bandwidth, memory, compute resources,etc.) that are available to the system when not in use by a user. Suchresources can be used as cloud storage and/or can be used to provide adistributed, cloud-like platform on which a control server can run as avirtual machine, cloud container, and/or the like.

According to some embodiments, ICD 105, video calling device 105, and/orPDD 105 might comprise a first video input interface to receive firstvideo input from a first local content source (which in some embodimentscan include a STB and/or the like) and a first audio input interface toreceive first audio input from the first local content source. Videocalling device 105 might further comprise a first video output interfaceto provide first video output to a first video display device and afirst audio output interface to provide first audio output to a firstaudio receiver. In some cases, the first video display device and thefirst audio receiver might be embodied in the same device (e.g., a TVwith built-in speaker system, or the like). With the input and outputinterfaces, video calling device 105 might provide pass-throughcapability for video and/or audio between the first local content sourceand the first display device. In some instances, high-definitionmultimedia interface (“HDMI”) cables or other suitable HD signal cablesmay be used to provide the interconnections for the pass-through. Videocalling device 105 may, in some cases, comprise a first image capturedevice to capture at least one of first image data or first video dataand a first audio capture device to capture first audio data. Videocalling device 105 may also comprise a first network interface, at leastone first processor, and a first storage medium in communication withthe at least one first processor.

In some aspects, a plurality of ICDs, PDDs, or video calling devices 105might be communicatively coupled together in a network (e.g., network115), each ICD, PDD, or video calling device being located in one of aplurality of customer premises. For implementing distributedinfrastructure for cloud computing, cloud-based application hosting,and/or cloud-based data storage, a computer might establish one or moreICDs, PDDs, or video calling devices 105 of the plurality of ICDs, PDDs,or video calling devices 105 as distributed infrastructure elements andmight provide at least one of one or more software applications,customer data, and/or media content to the one or more video callingdevices 105 for hosting on the one or more video calling devices 105.These and other functionalities of the video calling devices related todistributed infrastructure are described in greater detail in the '360application (already incorporated by reference herein).

Merely by way of example, in some aspects, a user can remotely accessone or more ICDs, PDDs, or video calling devices 105 and/or remotelyaccess at least one of the user's master account, the user's userpreference, the user's profiles, any videomail messages addressed to theuser, the user's media content, media content recommendations for theuser, determined advertisements, preferences for advertisements, and/orthe like over a network. For example, in a web-based implementation, auser could log into the user's master account by accessing a websitehosted on a web server (e.g., web server 125, which might be hosted on acloud server, hosted on distributed PDDs, hosted on distributed videocalling devices, and/or the like) and entering commands into a userinterface (e.g., user interface 120) associated with remotely accessingthe user's video calling device(s) 105 and/or associated with remotelyaccessing at least one of the user's master account, the user's userpreference, the user's profiles, any videomail messages addressed to theuser, the user's media content, media content recommendations for theuser, determined advertisements of the user, the user's preferences foradvertisements, and/or the like. In some instances, the user mightaccess and interact with the user interface over the network (e.g.,network 115) by using a user computer selected from a group consistingof a laptop computer, a desktop computer, a tablet computer, a smartphone, a mobile phone, a portable computing device, and/or the like. Inan application-based (or “app-based”) implementation, the user mightinteract with a software application (or “app”) running on the user'suser device, which might include, without limitation, a laptop computer,a desktop computer, a tablet computer, a smart phone, a mobile phone, aportable computing device, and/or the like. The app might includeanother user interface (similar to the web-based user interface) thatmight allow for access of the user's video calling device(s) (or anypaired video calling device(s)) over the network (e.g., network 115)and/or that might allow for access to at least one of the user's masteraccount, the user's user preference, the user's profiles, any videomailmessages addressed to the user, the user's media content, media contentrecommendations for the user, determined advertisements for the user,the user's preferences for advertisements, and/or the like.

According to some embodiments, control server 110, which can have anysuitable hardware configuration (an example of which is described belowwith respect to FIG. 10), might be a computer that is capable ofreceiving user input via a user interface 120 and/or performingoperations for controlling the user device(s) 105 (which in some casesmight comprise inline camera(s), which in turn might comprise cameras orother sensors, and the like). Merely by way of example, however, thecontrol server 110 can provide modified apparent views to be inserted ina video stream, and/or the like. In other cases, the control server 110can receive and/or store user input and/or user preferences that canspecify whether and how presence information should be used.

In an aspect of some embodiments, the user might log onto his or hermaster account at the control server in order to access and/or controlinline cameras assigned to that account. The user device 105 and/or thecontrol server 110 might authenticate the user with a set of credentialsassociated with the master account (e.g., with any of several knowauthentication schemes, such as a userid/password challenge, acertificate exchange process, and/or the like). Once the user has beenauthenticated, the user interface can present the user with a variety ofdifferent information, including without limitation information aboutstatus of inline cameras (or user devices 105 comprising the inlinecameras) assigned to the master account to which the user has logged on,options for controlling such inline cameras, and or the like.

Thus, in some aspects, the user device 105 and/or the control server 110might receive user preferences (e.g., via a network, such as theInternet, to name one example), and in particular user preferencesrelating to the collection and/or use of presence information, includingwithout limitation preferences such as those described above. The userdevice 105 and/or the control server 110 can further control and/orconfigure the inline camera, based at least in part on the userpreferences. Merely by way of example, the user might have specifiedthat the inline camera should not be used to collect presenceinformation at all, in which case that feature might be turned off atthe inline camera. Alternatively and/or additionally, the user mighthave specified some limitations on the collection of presenceinformation (such as about whom such information may be collected, timesat which information can be collected, and/or purposes for whichinformation may be collected, to name a few examples). Of course, insome embodiments, these preferences can be set directly at the inlinecamera, e.g., through a menu system displayed on a video device. Itshould also be recognized that some preferences (such as with whompresence information can be shared) might not affect the inline cameraand might be saved and/or operated on at the control server instead.

The amount of control imposed by the control server 110 can varyaccording to embodiment and implementation. Merely by way of example, asnoted above, in some embodiments, there might be no control server, andthe inline camera might incorporate all the functionality describedherein with regard to the control server 110. In other embodiments, thecontrol server 110 might provide fairly fine-grained control over theinline camera, such as instructing the camera to capture images forpurposes of determining presence, and/or the control server 110 mayreceive the images directly and perform the present determinationprocedures at the controls server. The division of responsibilitybetween the control server 110 and the inline camera or user device 105can fall anywhere along this spectrum. In some cases, for instance, thecontrol server 110 might provide the user preferences to the inlinecamera, which then is responsible for collecting presence information inaccordance with those preferences and transmitting the presenceinformation to the control server 110, which takes the appropriateaction in response to the presence information, such as, selecting anadvertisement based on the presence information. Alternatively and/oradditionally, the inline camera itself might be responsible for takingsuch actions.

In some cases, the user device or inline camera might collect presenceinformation. A variety of operations might be involved in the collectionof presence information. For example, in some cases, the inline cameracaptures one or more images of at least a portion of a room where it islocated. Such images can be digital still images, a digital videostream, and/or the like. Collecting presence information can furthercomprise analyzing one or more of the images. Merely by way of example,the images might be analyzed with facial recognition software, which canbe used to determine the number of people in the room with the inlinecamera and/or to identify any of such people (e.g., by determining aname, an age range, a gender, and/or or other identifying or demographicinformation about a user, based on the output of the facial recognitionsoftware). Alternatively and/or additionally, analyzing the images cancomprise determining that a person is watching a display device, forexample using eye-tracking software to identify a focus area of theperson's eyes and correlating that focus area with the location of atelevision. In some cases, if the number of people and the identities(or at least demographic characteristics) of each of the people in theroom can be determined, analyzing the images can further includedetermining a collective demographic of the people in the room (based,for example on the demographic characteristics of a majority of peoplein the room).

In some embodiments, the user device (or inline camera) 105 mightdetermine a position(s) of a user(s) relative to a display device incommunication with the user device (or inline camera) 105. The userdevice (or inline camera) 105 and/or the control server 110 might adjustan apparent view of video and/or image(s) on the display device inresponse to the determined position(s) of the user(s) relative to thedisplay device. In some cases, the user device (or inline camera) 105and/or the control server 110 might adjust audio (which might beassociated with the video and/or image(s), or might be stand-aloneaudio), in response to the determined position(s) of the user(s)relative to the display device. This technique allows for trackingmovement of the user(s), and can, in some cases, provide real-time ornear-real-time adjustment of video, image, and/or audio, in response tothe determined updated position(s) of the user(s).

In some aspects, server 110 might perform the methods described indetail with respect to FIGS. 2-9 below, while data associated with useraccount(s) or preferences, data associated with monitored user(s),and/or data associated with monitored media content might be collectedby the one or more user devices 105, by server 110, or by anycombination of these computing devices. The database 130 might storesome or all of these collected data.

Aside from the techniques described above, the user devices 105 and/orthe server 110 might perform any functions that are described in detailin any of the Related Applications and/or in the '182 patent, which arealready incorporated herein by reference in their entirety for allpurposes.

To illustrate these concepts, consider FIGS. 2 and 3. FIG. 2 illustratesa scenario 200 in which a camera or ICD 205 captures a scene. Thatcamera has a fixed field of view 210, which might define an angle 215that is rotated about a 360 degree direction about an axis that isnormal to the lens of the camera or ICD 205. The fixed field of view 210generally cannot be modified unless the settings or orientation of thecamera are manually modified. In contrast, however, as illustrated bythe scenario 300 of FIG. 3, a scene viewed on a display 320 by a user'seye 305 will have an ideal field of view 310, which is a function of theuser's position (in three dimensions) and time. In some cases, the idealfield of view 210 might define an angle 315 that is rotated about a 360degree direction about an axis that is normal to the lens of the user'seye 305. In some embodiments, a camera or ICD 205 might be designed tohave a field of view that defines an angle 215 that matches or exceedsangle 315.

To make the displayed scene more realistic and lifelike, the field ofview 310 (and the corresponding perspective) must depend on the user'sposition at any given time, and must change if the user's positionchanges. (As used herein, the term “position,” when referring to a user,can either refer generally to a user's position or can refer morespecifically to the position of the user's eyes, or a proxy thereof,such as the centroid of an ellipse that encompasses the user's eyes.)

FIGS. 4A-4F (collectively, “FIG. 4”) are general schematic diagramsillustrating techniques for adjusting an apparent field of view of adisplay device, in accordance with various embodiments. For example, asillustrated by FIGS. 4A-4C, the apparent field of view is increased whenthe user is closer and decreased when the user is farther away. Thedisplay side portion (shown in FIG. 4A) shows the side on which the user405 is located and on which the display device 410 displays content(including, without limitation, images/video captured from the captureside, and/or the like) to the user 405. The position of the user 405(and/or the user's eyes) may be tracked by camera 415 a. The captureside portion (shown in FIG. 4B) shows the side on which another party toa video call is located or the side on which a live video stream iscaptured (or the like). The other party to the video call or the objectsof the live video stream may be captured by camera 415 b. The captureside shows the maximum field of view (“FOV”) 420 (shown as a pair ofsolid lines in FIG. 4B) that the camera 415 b captures, as well as thevarious FOVs 425 and 430 that the camera 415 b captures in varioussituations.

On the display side (FIG. 4A), the user 405 is shown in 2 differentpositions—i.e., position P₁ (which is located a distance d₁ from theface of the display device 410) and position P₂ (which is located adistance d₂ from the face of the display device 410). In position P₁,the viewer is close to the display device 410. This corresponds to awider field of view 425 as shown (as a pair of dot-dash lines) in thecapture side figure (FIG. 4B). In position P₂, the viewer is furtherfrom the display device 410. This corresponds to a narrow field of view430 as shown (as a pair of dash lines) in the capture side figure (FIG.4B). Although two positions are shown, the techniques described hereinallow for tracking the user 405 through any number of positions relativeto the display device.

FIG. 4C depicts the effective FOVs of the user 405, when the user 405 islocated at positions P₁ and P₂, for instance. In FIG. 4C, one mighttreat display device 410 as if it were a virtual window looking into thecapture side (in a sense, through the “peephole” of camera 415 b). Forexample, on the display side, when the user 405 is at position P₁ (i.e.,at a distance d₁ from the display device 410), the user's effective FOV425′ might ideally extend from the display side, beyond display device410, to the capture side. Because the camera 415 b might effectively actas a peephole or the like, in order to display an appropriate FOV 425 onthe display device 410 to simulate this ideal, effective FOV 425′,objects within FOV 420 should ideally be at least on plane 435 that isparallel to a face of the camera 415 b (which, from a functionalperspective, might have a position that is effectively (though notactually) behind display device 410) or extend outward from camera 415 bbeyond plane 435. In this manner, it may be ensured that objects withinthe FOV 420 may be captured in images/video. Any objects or portions ofobjects between camera 415 b and plane 435 may not be fully captured (orindeed captured at all), thus resulting in a somewhat unnaturalimage/video that is displayed on the display device, which would noteffective simulate a virtual window. In some cases, the user device orcontrol server might use image processing techniques to remove suchobjects (or partial image-captured objects) from the resultant displayedvideo or image(s).

When the user 405 moves to position P₂ (i.e., at a distance d₂ from thedisplay device 410), the user's effective FOV 430′ might ideally extendfrom the display side, beyond display device 410, to the capture side.For similar reasons as with FOV 425′, to display an appropriate FOV 430on the display device 410 to simulate this ideal, effective FOV 430′,objects within FOV 420 should ideally be at least on plane 440 that isparallel to a face of the camera 415 b or extend outward from camera 415b beyond plane 440.

FIGS. 4D-4F illustrate this process for horizontal movements of the user405. As shown in FIG. 4D, camera 415 a might be used for determining theuser's 405 position relative to (a face of) display device 410 (and canbe used to transmit video or other media content to the user 405, aswell, for example, as part of a video call or the like). The horizontalposition is relative to the display side camera 415 a. In the displayside portion of the figure (FIG. 4D), position P₁ indicates a horizontaloffset (by distance x) from the centerline (which defines a line that isnormal to a face of the camera 415 a or that is normal to the face ofthe display device 410). The FOV 425 for this offset position is shown(as a pair of dot-dash lines) in the capture side figure (FIG. 4E). Forreference, the FOV of position P₂ is also shown. Position P₂ correspondsto one in which the user is not horizontally offset relative to thedisplay side camera (i.e., is aligned with the centerline). The FOV 430for this non-offset position is shown (as a pair of dash lines) in thecapture side figure (FIG. 4E). In both these examples, the user 405remains at a constant distance y from the display device 410.

Like FIG. 4C, FIG. 4F depicts the effective FOVs of the user 405, whenthe user 405 is located at positions P₁ and P₂, for example. In FIG. 4F,as in FIG. 4C, one might treat display device 410 as if it were avirtual window looking into the capture side (in a sense, through the“peephole” of camera 415 b). For example, on the display side, when theuser 405 is at position P₁ (i.e., positioned to the right at a distancex from the centerline), the user's effective FOV 425′ might ideallyextend from the display side, beyond display device 410, to the captureside, with the FOV 425′ shifted to the left. To display an appropriateFOV 425 on the display device 410 to simulate this ideal, effective FOV425′, objects within FOV 420 should ideally be at least on plane 435that is parallel to a face of the camera 415 b or extend outward fromcamera 415 b beyond plane 435. In this manner, it may be ensured thatobjects within the FOV 420 may be captured in images/video. Any objectsor portions of objects between camera 415 b and plane 435 may not befully captured (or indeed captured at all), thus resulting in a somewhatunnatural image/video that is displayed on the display device, whichwould not effective simulate a virtual window.

When the user 405 moves to position P₂ (i.e., at a distance x from theposition P₁ (and aligned along the centerline), and at a distance y fromdisplay device 410), the user's effective FOV 430′ might ideally extendfrom the display side, beyond display device 410, to the capture side.For similar reasons as with FOV 425′, to display an appropriate FOV 430on the display device 410 to simulate this ideal, effective FOV 430′,objects within FOV 420 should ideally be at least on plane 440 that isparallel to a face of the camera 415 b or extend outward from camera 415b beyond plane 440.

Although not shown, vertical movements of the user 405 relative to thedisplay device 410 may be tracked, and the FOV may be adjusted in asimilar manner as described above with respect to FIGS. 4D-4F.

A number of techniques can be used to detect the position of the user(or, as noted above, more precisely, the user's eyes), along anycombination of three dimensions. Merely by way of example, in someembodiments, location of the viewer's eyes on the display side can bedetected (or estimated) by one or more of techniques including, but notnecessarily limited to, (a) distance sensors (including, withoutlimitation, lidar sensors, radar sensors, sonar sensors, and/or thelike); (b) facial recognition techniques; (c) point locating device(e.g., remote control, headset, glasses, and/or similar devices), (d)silhouette detection, (e) eye tracking techniques; and/or (f) othertechniques. The analysis techniques to determine the user's position canbe performed by a video calling device (or other user device) thatcaptures the video of the user, by a control server, by a video callingdevice (or other user device) that is used to record the video to bedisplayed to the user, or by a combination of these devices.

FIGS. 5A and 5B (collectively, “FIG. 5”) are general schematic diagramsillustrating techniques for adjusting apparent fields of view of adisplay device for multiple users, in accordance with variousembodiments. As illustrated by FIG. 5A, display side portion (shown inFIG. 5A) shows the side on which the users 505 a and 505 b(collectively, “users 505”) are located and on which the display device510 displays content (including, without limitation, images/videocaptured from the capture side, and/or the like) to the users 505. Theposition of the users 505 (and/or the users' eyes) may be tracked bycamera 515 a. The capture side portion (shown in FIG. 5B) shows the sideon which another party to a video call is located or the side on which alive video stream is captured (or the like). The other party to thevideo call or the objects of the live video stream may be captured bycamera 515 b. The capture side shows the maximum field of view (“FOV”)520 (shown as a pair of solid lines in FIG. 5B) that the camera 515 bcaptures, as well as the various FOVs 525 and 530 that the camera 515 bcaptures in various situations for each of the users 505 a and 505 b.

On the display side (FIG. 5A), camera 515 a might be used fordetermining the first user's 505 a position relative to (a face of)display device 510 (and can be used to transmit video or other mediacontent to the first user 505 a, as well, for example, as part of avideo call or the like). The horizontal position is relative to thedisplay side camera 515 a. In the display side portion of the figure(FIG. 5A), position P₁ indicates a horizontal offset (by distance x)from the centerline (which defines a line that is normal to a face ofthe camera 515 a or that is normal to the face of the display device510). The FOV 525 for this offset position is shown (as a pair ofdot-dash lines) in the capture side figure (FIG. 5B). Likewise, camera515 a might be used for determining the second user's 505 b positionrelative to (a face of) display device 510 (and can be used to transmitvideo or other media content to the second user 505 b, as well, forexample, as part of a video call or the like). In the display sideportion of the figure (FIG. 5A), position P₂ is shown aligned with thecenterline. The FOV 530 for this offset position is shown (as a pair ofdash lines) in the capture side figure (FIG. 5B). In both theseexamples, the users 505 remain at a constant distance y from the displaydevice 510 (although the various embodiments are not so limited, and theusers 505 may be positioned one closer to the display device 510compared to the other).

In some embodiments, in order for both users 505 to view the differentFOVs 525 and 530, various techniques may be used, including, but notlimited to, techniques such as the use of active glasses that, based atleast in part on time synchronization with the display device 510, canallow one pair of active glasses (worn by one user) to receive one FOV,while the other pair of active glasses (worn by the other user) blocksthat particular FOV, and vice versa, such that the eyes of each useronly receives images/video corresponding to one set of FOV and not theother. Such a technique of using the active glasses to alternate betweenframes of displayed content to display different FOVs is described indetail below with respect to FIG. 8.

Similar to the above, a number of techniques can be used to adjust afield of view (“FOV”) to correspond to the viewer's position. Onetechnique is the creation of a windowed field of view, as depicted byFIG. 6, which is a general schematic diagram illustrating a windowedfield of view in relation to a sensor field of view, in accordance withvarious embodiments. In FIG. 6, a sensor field of view (“FOV”) 605 isshown in relation to a windowed FOV 610. The sensor FOV 605 representsthe FOV that is achieved by a sensor at the capture side, while thewindowed FOV 610 represents the FOV that is displayed on a displaydevice at the display side.

The video stream that is captured can be the entire FOV (referred to, insome embodiments herein, as “maximum field of view”), or can be a subsetthat is smaller and can be positioned arbitrarily (or to correspond tothe viewer's position) within the full sensor field of view. This isdenoted “windowed FOV” in FIG. 6. If the full FOV is captured, the videocan be cropped to produce the desired windowed FOV.

Thus, one approach is to adjust the windowed FOV 610 on the capture sidecamera to something other than the full FOV and in a manner thatcorresponds to the position of the viewer's eyes on the display side.One way to do this is to send the coordinates of the viewer's eyes tothe capture side. This could be done in a peer-to-peer fashion and/ormight be facilitated via a server. Merely by way of example, in someembodiments, peer-to-peer sessions might be initiated using a server,and after a peer-to-peer session has been initiated or established bythe server, the server may be by-passed, resulting in a directpeer-to-peer connection or session. This could also be done vianetworking protocols such as TCP, UDP, RTP, XMPP, SIP or others. Oncethe capture side camera has the coordinates of the viewer's eyes, thewindowed FOV 610 (which in this case represents the camera's or sensor'sFOV) can be adjusted accordingly, and the image that is seen on thedisplay side would adjust based on the position of the viewer's eyes.

An alternative approach would be to have the capture side always sendthe full FOV 605 to the display side. With this approach, the videocommunications device on the display side would manipulate the videostream to display a windowed version that is a subset of the full FOVthat corresponds to the position of the viewer's eyes. The advantage ofthis approach is that no additional network communication is required,and the latency between any view movements and the image adjustment onthe display side would be reduced.

For example, as depicted by FIG. 6, the windowed FOV 610 is moved leftwhen the user moves (and/or the user's eyes move) right, and/or is moveddown when the user moves (and/or the user's eyes move) up. Similarly,although not shown, the windowed FOV 610 is moved right when the usermoves (and/or the user's eyes move) left, and/or is moved up when theuser moves (and/or the user's eyes move) down. Although not shown, theuser (and/or the user's eyes) moving in any combination of left, right,up, and/or down relative to the display device will result in thewindowed FOV 610 being moved in the corresponding combination of right,left, down, and/or up, respectively.

Yet another approach is to have a camera on the capture side that has aphysical mechanism for the adjustment of the field of view (i.e., pan,tilt, and zoom, etc.). If the camera has such capability, then when theviewer's eyes' coordinates are sent across the network to the captureside, the camera's position can physically be adjusted (by any suitablecombination of panning, tilting, zooming, and/or the like) to produce animage that is appropriate for the viewer's eyes. In some cases, thecapture side device might feature an array of cameras (as shown, e.g.,in FIG. 7B), which can expand the field of view that can be captured.The images from one or more cameras can be combined and processed toproduce a larger field of view than a single camera alone (as shown,e.g., in FIG. 7A). Camera arrays can be used to form a composite imageusing the images from one or more camera. This composite image can havea virtual perspective that is different than any of the individualcameras. The virtual perspective can be set to create a perspectivebased on the location of the viewer. For example, the perspective can bewith respect to the viewer and his or her display.

FIGS. 7A and 7B (collectively, “FIG. 7”) are general schematic diagramsillustrating a display device 700 in use with one or more image capturedevices, in accordance with various embodiments. In FIG. 7, displaydevice 700 might comprise housing 705, display screen 705 a, displayedor windowed FOV 710, image-captured object(s) 715 (which in theembodiments shown in FIG. 7 might include a call participant in a videocall, or the like). Also shown in FIG. 7 are one or more image capturedevices (“ICDs”) or cameras 720; in FIG. 7A, a single ICD or camera 720is shown, while, in FIG. 7B, a plurality of ICDs or cameras 720 areshown (although five ICDs or cameras 720 a-720 e are shown, this ismerely for illustration, and any suitable number of ICDs or cameras 720may be used). As described above, multiple ICDs or cameras 720 (whichmay be arranged in an array(s)) can be used to form a composite imageusing the images captured by the plurality of ICDs or cameras 720. Thecomposite image may represent one frame in a series of frames of a video(such as in a video call, movie content, television content, live videostream, etc.).

Also shown in FIG. 7A is a plane 725 that is parallel to a plane definedby (the screen 705 a or face of) the display device 700. Axes x and zrepresent the horizontal and vertical axes, respectively. In someembodiments, determining a position of a first user (who might be aviewer or a first party to a video call, or the like) might comprisedetermining a horizontal position of the first user in a horizontaldimension (e.g., along the x-axis) of the plane 725, which is parallelto the face of the display device. In such embodiments, adjusting anapparent or windowed FOV might comprise panning the video in ahorizontal direction (i.e., along the x-axis) or moving the windowed FOVin the horizontal direction, based on the determined horizontal positionof the first user. In particular, when the user moves (and/or the user'seyes move) right along the positive x direction, the windowed FOV ismoved left (along the negative x direction), and vice versa. In asimilar manner, determining a position of the first user might comprisedetermining a horizontal position of the first user in a verticaldimension (e.g., along the z-axis) of the plane 725, which is parallelto the face of the display device. In such embodiments, adjusting anapparent or windowed FOV might comprise panning the video in a verticaldirection (i.e., along the z-axis; sometimes referred to as “tilting”)or moving the windowed FOV in the vertical direction, based on thedetermined vertical position of the first user. In particular, when theuser moves (and/or the user's eyes move) up along the positive zdirection, the windowed FOV is moved down (along the negative zdirection), and vice versa.

We now turn to FIG. 8, which illustrates a functional diagram of asystem 800 for modifying an apparent view(s) of displayed content, basedat least in part on sensed presence and/or determined position(s) of auser in a room, in accordance with one set of embodiments. The skilledreader should note that the arrangement of the components illustrated inFIG. 8 is functional in nature, and that various embodiments can employa variety of different structural architectures. Merely by way ofexample, one exemplary, generalized architecture for the system 800 isdescribed below with respect to FIG. 8, but any number of suitablehardware arrangements can be employed in accordance with differentembodiments.

In FIG. 8, an ICD 805 might correspond to ICD 105, video calling device105, and/or PDD 105, while user device 845 might correspond to non-ICDuser device 105, non-video calling device user device 105, or non-PDDuser device 105, as described in detail above with respect to FIG. 1.Control server 810, network 815, and cloud storage system 830, in theexample of FIG. 8, might correspond to control server 110, network 115,and cloud storage system 130, respectively, as described in detail abovewith respect to FIG. 1.

System 800 might further comprise a local content source 835 (e.g., alocal content source as described above), a display device 840(including, without limitation, a television (“TV”), a computer monitor,and/or the like), and high-definition (“HD”) data cables 850 (or anyother suitable data transmission media). In some cases, the HD datacables 850 might include, without limitation, high-definition multimediainterface (“HDMI”) cables. One or more of the ICDs 805 (e.g., the firstICD 805 a and the second ICD 805 b, as shown in FIG. 8) might beconfigured to provide pass-through audio and/or video from a localcontent source 835 to a display device 840 (e.g., using data cables850). Merely by way of example, in some embodiments, an HDMI input portin the ICD 805 allows HD signals to be input from the correspondinglocal content source 835, and an HDMI output port in the ICD 805 allowsHD signals to be output from the PDD 805 to the corresponding displaydevice 840 (e.g., TV, which might include, but is not limited to, anInternet Protocol TV (“IPTV”), an HDTV, a cable TV, or the like). Theoutput HD signal may, in some cases, be the input HD signal modified bythe ICD 805. Local content source 835 might be any suitable localcontent source. An noted above, a local content source can be any devicethat provides an audio or video stream to a display device and thus caninclude, without limitation, a cable or satellite STB, an IPTV STB,devices that generate video and/or audio, and/or acquire video and/oraudio from other sources, such as the Internet, and provide thatvideo/audio to a display device; hence a local content source caninclude devices such as a video game console, a Roku® streaming mediaplayer, an AppleTV®, and/or the like. Hence, when situated functionallyinline between a local content source and a display device, the ICD 805can receive an audiovisual stream output from the local content source,modify that audiovisual stream in accordance with the methods describedin the '182 patent, and provide the (perhaps modified) audiovisualstream as input to the display device 840. In some embodiments, firstICD 805 a, local content source 835 a, display device 840 a, and userdevice 845 a (if any) might be located at a first customer premises 860a, while second ICD 805 b, local content source 835 b, display device840 b, and user device 845 b (if any) might be located at a secondcustomer premises 860 b. According to some embodiments, a user device845 might be located at a customer premises 860 or might be a portableuser device (including, without limitation, a tablet computer, a laptopcomputer, a smart phone, a mobile phone, a portable gaming device,and/or the like) that is not bound to any particular customer premises860, and the like.

According to some embodiments, system 800 might further comprise one ormore access points (not shown), each of which might be located inproximity to or in the first customer premises 860 a or the secondcustomer premises 860 b. The access point(s) can allow wirelesscommunication between each ICD 805 and network 815. (Of course, an ICD805 might also have a wired connection to an access point, router,residential gateway, etc., such as via an Ethernet cable, which canprovide similar communication functionality.) In some cases (as shown),each ICD 805 might be communicatively coupled to network 815 (via eitherwired or wireless connection), without routing through any accesspoints. In some cases, wired or wireless access to network 815 allowsICD 805 to obtain profiles from cloud storage system 830, media contentfrom first content server 870 and/or database 875 that are independentof the corresponding local content source 835, which is in communicationwith a content distribution network 865 (either via wireless connectionor via wired connection). In some cases, content distribution network865 (which could be, for example, a cable television distributionnetwork, a satellite television distribution network, an InternetProtocol television (“IPTV”) distribution network, and/or the like)might be communicatively coupled with second content server 880, andthus local content source 835 might obtain media content from secondcontent server 880 and media content database 885 independently of ICD805. Alternatively or in addition, the content distribution network 865might be communicatively coupled to other content servers (e.g., firstcontent server 870 or the like) and/or other media content sources(e.g., database 875 or the like).

In this manner, ICD 805 can overlay the input signal from thecorresponding local content source 835 with additional media content toproduce an augmented output HD signal to the corresponding displaydevice 840 via data cables 850. This functionality allows forsupplemental content (which may be associated with the media contentaccessed by the local content source 835 for display on display device840) to be accessed and presented using the first ICD 805, in somecases, as a combined presentation on the display device 840, which maybe one of an overlay arrangement (e.g., a picture-in-picture (“PIP”)display, with the supplemental content overlaid on the main content), asplit screen arrangement (with the supplemental content adjacent to, butnot obscuring, any portion of the main content), a passive banner stream(with non-interactive supplemental content streaming in a banner(s)along one or more of a top, bottom, left, or right edge of a displayfield in which the main content is displayed on display device 840),and/or an interactive banner stream (with interactive supplementalcontent streaming in a banner(s) along one or more of a top, bottom,left, or right edge of a display field in which the main content isdisplayed on display device 840). Herein, examples of interactivesupplemental content might include, without limitation, content thatwhen streamed in a banner can be caused to slow, stop, and/or replaywithin the banner, in response to user interaction with the contentand/or the banner (as opposed to passive banner streaming, in whichinformation is streamed in a manner uncontrollable by the user). Theinteractive supplemental content that is streamed in the banner may, insome instances, also allow the user to invoke operations or functions byinteracting therewith; for example, by the user highlighting and/orselecting the supplemental content (e.g., an icon or still photograph ofa character, actor/actress, scene, etc. associated with the maincontent), links for related webpages, links to further content stored inmedia content database 875, or operations to display related content ondisplay device 840 and/or user device 845 may be invoked. In someembodiments, the interactive supplemental content might includenotifications or messages relating to recommendations of media content,the determination and generation of which are described in detail above.According to some embodiments, the interactive supplemental content(whether related or unrelated to the media content being presented)might include advertisement content.

In some instances, ICD 805 might detect the presence and/or proximity ofone or more user devices 845 associated with the user, and might (basedon user profile information associated with the user that is stored,e.g., in cloud storage system 830) automatically send supplemental mediacontent via wireless link 855 (directly from ICD 805 or indirectly viaan access point (not shown)) for display on a display screen(s) of theone or more user devices 845. In one non-limiting example, a userassociated with first ICD 805 a might have established a user profilestored in cloud storage system 830 that indicates a user preference forany and all supplemental content for movies and television programs tobe compiled and displayed on one or more user devices 845 a (including,but not limited to, a tablet computer, a smart phone, a laptop computer,and/or a desktop computer, etc.) concurrent to display of the movie ortelevision program being displayed on display device 840 a. In such acase, when a movie is playing on display device 840 a broadcast orstreamed via local content source 835 a from content server 870 andmedia content database 875 (and/or from some other content server andsome other media content source) via network 865, first ICD 805 aaccesses supplemental content (if available) from content server 870 andmedia content database 875 via network 815, and sends the supplementalcontent to the user's tablet computer and/or smart phone via wirelesslink(s) 855. For example, bios of actors, actresses, and/or crew mightbe sent to the user's smart phone for display on the screen thereof,while schematics of machines, weapons, robots, tools, etc. associatedwith the movie or television show might be sent to and displayed on theuser's tablet computer, behind the scenes videos or information,news/reviews associated with the main content, and/or music videosassociated with the main content may also be sent to the user's smartphone and/or tablet computer, and so on.

Merely by way of example, in some embodiments, first media content mightbe received by local content source 835 a (in customer premises 860 a)from media content database 875 b via content server 870 and contentdistribution network 865. The first ICD 805 a might provide pass throughcapability for displaying video aspects (in some cases audio aspects aswell) of the first media content from the local content source 835 a. Asthe first media content passes through the first ICD 805 a, the firstICD 805 a might monitor the media content, and might generate or selectadvertisements based at least in part on the monitored media content.Alternatively, or in addition, the first ICD 805 a might comprisesensors (e.g., camera, microphone, proximity sensors, user devicesensors, communications links, etc.) that monitor the user(s) within thesame room, e.g., to monitor or track reactions of each user (including,but not limited to, vocal expressions or outbursts, facial expressions,hand gestures, body gestures, eye movement, eye focus, shift inproximity with respect to the PDD, and/or the like), using any number orcombination of techniques, including, without limitation, facialrecognition techniques, facial expression recognition techniques, moodrecognition techniques, emotion recognition techniques, voicerecognition techniques, vocal tone recognition techniques, speechrecognition techniques, eye movement tracking techniques, eye focusdetermination techniques, proximity detection techniques, and/or thelike. The first ICD 805 a might determine advertisements based at leastin part on the monitored reactions of each user.

In some instances, the first ICD 805 a might send the informationassociated with the monitored media content and/or informationassociated with the monitored reactions of each user to control server810 over network 815, and control server 810 might determine or generaterecommendations for media content, based at least in part on themonitored media content and/or based at least in part on the monitoredreactions of each user, which is described in detail (along with otherembodiments of media content recommendation, or the like) in the '435application (already incorporated herein by reference in its entirety).In some embodiments, control server 810 might determine (i.e., selectand/or generate) advertisements based at least in part on the monitoredmedia content and/or based at least in part on the monitored reactionsof each user, which is described in detail (along with other embodimentsof advertisement determination, or the like) in the '133 and '603applications (already incorporated herein by reference in theirentirety).

According to some embodiments, the detection of the presence of the userdevice 845 by the first ICD 805 a or the second ICD 805 b might allowidentification of a user and thus access of profiles, content, and/ormessages and notifications associated with the user's account,regardless of whether the first ICD 805 a or the second ICD 805 b isowned by and/or associated with the user. Herein, the user's mediacontent might include, without limitation, at least one of purchasedvideo content, purchased audio content, purchased video game, purchasedimage content, rented video content, rented audio content, rented videogame, rented image content, user-generated video content, user-generatedaudio content, user-generated video game content, user generated imagecontent, and/or free media content, while the user's profiles mightinclude, but is not limited to, one or more of user profile informationfor a video game or video game console, web browser history and/orbookmarks, contact information for the user's contacts, user profileinformation for video or audio content, including without limitationrecommended content, device preferences, messaging preferences,videomail preferences, user profile information for cloud services,and/or the like. Videomail, herein, might refer to videomail messagesaddressed to the user or callee. In some cases, the user's profile mightalso include identifying information—including, but not limited to, theuser's biometric information (e.g., facial characteristics, voicecharacteristics, fingerprint characteristics, iris characteristics,pupil characteristics, retinal characteristics, etc.), user's pastmonitored reactions (e.g., vocal expressions or outbursts, facialexpressions, hand gestures, body gestures, eye movement, eye focus,shift in proximity with respect to the PDD, and/or the like), or thelike. In some examples, the user profile information for cloud servicesmight include user log-in information (e.g., username, account number,and/or password/passphrase, etc.) or other suitable credentials forcloud services, which might include, without limitation, video callingservice, videomail service, voice calling service, videobroadcast/streaming service, audio broadcast/streaming service, on-linegaming service, banking/financial services, travel/accommodation/rentalvehicle services, and/or dining/entertainment eventreservation/ticketing services, or the like.

In one example, a user might be associated with first ICD 805 a (locatedin the first customer premises 860 a), while her friend might beassociated with second ICD 805 b (located in the second customerpremises 860 b), and the user and the friend are both subscribers of asimilar service provided by control server 810 and/or the cloud serviceprovider associated with control server 810. When the user visits herfriend, the friend's ICD 805 b might first detect presence of the user,by querying and/or obtaining the identification information for theuser's smart phone and/or tablet computer or the like, by capturingvideo, image, and/or voice data of the user, by infrared detection of aliving person in the room, and/or by audio detection of a living personin the room, etc. The friend's ICD 805 b might then identify the userusing the user's device(s) identification information and/or thecaptured video, image, and/or voice data, or might send such presenceinformation to control server 810 for identification and authenticationanalysis. In some cases, detecting presence of, oridentifying/authenticating, the user might include, without limitation,analyzing captured images or video segments using one or more of facialrecognition software, pupil/iris recognition software, retinalidentification software, fingerprint analysis software, and/orphysiology recognition software, analyzing captured audio samples usingone or more of voiceprint analysis and/or comparison with storedchallenge/response information, and/or identification of a user deviceowned by and/or associated with the user (e.g., based on identificationinformation of the device, which may be previously associated with theuser or the user's profile(s), etc.). In terms of detection of thepresence of the user's device, any suitable technique may be implementedincluding, but not limited to, at least one of detecting a Bluetoothconnection of the user device, detecting that the user device isassociated with a WiFi access point with which the video calling devicehas associated, and/or communicating with the user device using nearfield communication (“NFC”).

Once the user has been identified and authenticated, control server 810might send copies of the user's profiles and/or content to the secondICD 805 b (either from first ICD 805 a and/or from cloud storage system830, or the like), or at least provide the user with access to herprofiles, notifications of media content recommendations, notificationof determined advertisements, preferences for advertisements, videomail,and/or content from her friend's ICD 805 b. In some embodiments, theidentification and authentication processes might include comparing theuser device identification information and/or the captured video, image,and/or voice data against all similar identification data for allusers/subscribers of the cloud service that are stored in cloud storagesystem 830. In some cases, the process might be facilitated where ICDs805 a and 805 b might already be associated with each other (e.g., wherethe user has previously made a video call from first ICD 805 a to herfriend on second ICD 805 b, where the user might have added the friendto the user's contact list, and/or where the friend might have added theuser to the friend's contact list). In other cases, the user's first ICD805 a might have access to the user's calendar and/or communications,which might indicate that the user is visiting the friend. The first ICD805 a might query control server 810 to determine whether the friend hasan ICD 805 b associated with the cloud service provider. In thisexample, the first ICD 805 a determines that second ICD 805 b is part ofthe same service and/or is in communication with control server 810, andbased on such determination, first ICD 805 a (and/or control server 810)might send the user's profiles and/or content to second ICD 805 b,and/or provide second ICD 805 b with access to the user's profiles,notifications of media content recommendations, notifications ofdetermined advertisements, preferences for advertisements, videomail,and/or content. In some embodiments, the user's profiles, notificationsof media content recommendations, notifications of determinedadvertisements, preferences for advertisements, videomail, and/orcontent, or access to profiles, notifications of media contentrecommendations, notifications of determined advertisements, preferencesfor advertisements, videomail, and/or content, might be encrypted, andmight be released/decrypted upon identification and/or authentication bysecond ICD 805 b (and/or by control server 810) when the user isdetected by second ICD 805 b. In this manner, the user's profiles,notifications of media content recommendations, notifications ofdetermined advertisements, preferences for advertisements, videomail,and/or content can follow the user wherever she goes, so long as thereis a device (e.g., PDD or video calling device) that is associated withthe same or affiliate cloud service provider at her destination, and solong as the device can recognize and authenticate the user.

By the same token, if the user is no longer detected by the second ICD805 b, either after a predetermined number of prompts or queries for theuser and/or after a predetermined period of time (e.g., after aspecified number of minutes, hours, days, weeks, months, etc.), secondICD 805 b (and/or control server 810) might determine that the user isno longer present at the location of second ICD 805 b. Based on such adetermination, second ICD 805 b and/or control server 810 might removethe user's profiles, notifications of media content recommendations,notifications of determined advertisements, preferences foradvertisements, videomail, and/or media content (or access thereto) fromsecond ICD 805 b. As described above, a time-out system might beutilized. Alternatively, other suitable systems may be used fordetermining the user is no longer present, and removing the user'sprofiles, notifications of media content recommendations, notificationsof determined advertisements, preferences for advertisements, videomail,and/or media content (or access thereto) from the second ICD 805 b. Insome cases, once the user is determined to no longer be present at thelocation of the second ICD 805 b, the system might either stoppresenting the advertisement(s) (if currently being presented) or notpresent the advertisement(s) (if not yet presented).

In some embodiments, system 800 might provide virtual windowfunctionality. In other words, system 800 might modify an apparentview(s) of displayed content, based at least in part on sensed presenceand/or determined position(s) of a user in a room. For example, in thecase of media content presentation (e.g., presentation of one of moviecontent, television program content, video content, image content,gaming content, and/or the like), first ICD 805 a might determine orcollect presence and/or position information about a user with respectto the display device 840 a. In some cases, first ICD 805 a and/orcontrol server 810 might modify an apparent view of the media content(either from first content server 870 and database 875 via network 815or from second content server 880 and database 885 via local contentsource 835 a and network 865, or the like) that is displayed on displaydevice 840 a, based at least in part on the position information of theuser, similar to the techniques as described above with respect to FIGS.1-7.

For example, if the user moves closer to the display device 840 a, thefirst ICD 805 a might determine and/or collect the changed position ofthe user relative to the display device 840 a, and the first ICD 805 aand/or the control server 810 might modify the apparent view of themedia content displayed on display device 840 a by increasing theapparent field of view of the media content displayed. Conversely, ifthe user moves further away from the display device 840 a, the first ICD805 a might determine and/or collect the changed position of the userrelative to the display device 840 a, and the first ICD 805 a and/or thecontrol server 810 might modify the apparent view of the media contentdisplayed on display device 840 a by decreasing the apparent field ofview of the media content displayed. If the user moves left with respectto the display device 840 a, the first ICD 805 a might determine and/orcollect the changed position of the user relative to the display device840 a, and the first ICD 805 a and/or the control server 810 mightmodify the apparent view of the media content displayed on displaydevice 840 a by proportionally changing the apparent field of view ofthe media content displayed toward the right (in some cases, byproportionally changing an apparent perspective of the media contenttoward the right; herein, changing an apparent perspective of the mediacontent might include changing the apparent field of view such that theapparent view of the media content is panned or tilted with respect to aprevious apparent view of the media content, or otherwise modifying theapparent so that the image/video displayed appears to have been capturedfrom a different angle). If the user moves right with respect to thedisplay device 840 a, the first ICD 805 a might determine and/or collectthe changed position of the user relative to the display device 840 a,and the first ICD 805 a and/or the control server 810 might modify theapparent view of the media content displayed on display device 840 a byproportionally changing the apparent field of view of the media contentdisplayed toward the left (in some cases, by proportionally changing anapparent perspective of the media content toward the left).

According to some embodiments, the user may move in any combination ofcloser/further, left/right, up/down, and/or the like with respect todisplay device 840 a, over a period of time (e.g., during presentationof at least a portion, if not the entirety, of the media content), andthe ICD 805 a can track such movements, and the ICD 805 a and/or thecontrol server 810 can modify the apparent view of the displayed mediacontent accordingly (despite the combination of the movements), suchthat the resultant apparent fields of view track the movements of theuser, to provide a more natural display, not unlike looking out aphysical window while changing one's position relative to the window(hence, in some cases, the display device that displays modified oradjusted apparent views of content according to this technique might bereferred to as a “virtual window”). In some cases, the modification ofthe apparent view of the displayed media content might be performed inreal-time or near real-time (i.e., with minimal, almost imperceptiblelag).

For video calls, similar virtual window functionality may be achieved ina similar manner. Here, the ICD 805 associated with the particular callparticipant might determine and/or collect presence information aboutthe corresponding call participant, and the particular ICD and/orcontrol server 810 might modify the apparent view of the correspondingvideo feed of the other call participant accordingly. In a non-limitingexample, a caller at the first customer premises 860 a might initiate,using first ICD 805 a, a video call with a callee at the second customerpremises 860 b. After the video call has been established between firstICD 805 a and second ICD 805 b (perhaps via control server 810 andnetwork 815), first ICD 805 a might display video feeds of the callee ondisplay device 840 a, while second ICD 805 b might display video feedsof the caller on display device 840 b. During the call, the caller mightshift position with respect to display device 840 a (say, for example,moving a bit closer and to the left with respect to the display device840 a). First ICD 805 a might track this movement, and first ICD 805 aand/or control server 810 might modify the apparent view of the calleedisplayed on display device 840 a in one of several ways. In one set ofembodiments, modifying the apparent view might include, but is notlimited to, sending instructions to second ICD 805 b to perform at leastone of panning to the right, zooming in on the callee, and/or increasingthe apparent field of view. In another set of embodiments, second ICD805 b might normally send a maximum field of view to the first ICD 805 aand/or control server 810, which might normally reduce the apparentfield of view prior to displaying the video feed on display device 840a. In such cases, modifying the apparent view might include, withoutlimitation, changing the apparent field of view by taking the maximumfield of view that is sent from second ICD 805 b, by simulating the atleast one of panning to the right, zooming in on the callee, and/orincreasing the apparent field of view.

Likewise, if the callee changes her position with respect to displaydevice 840 b, second ICD 805 b might track the movement, and second ICD805 b and/or control server 810 might modify the apparent view of thecaller displayed on display device 840 b in a similar manner asdescribed above with respect to the modification of the apparent view ofthe callee displayed on display device 840 a.

In some embodiments, rather than a single camera or single image capturedevice 805 being used at each of the customer premises 860, multiplecameras or multiple image capture devices (in some cases, arranged in anarray(s)) may be used, and a composite image/video with composite fieldof view (both maximum and displayed) may be generated (either by ICD 805and/or by control server 810). In such embodiments, modification of theapparent view may be performed by modifying the composite image/videoand/or modifying the composite field of view, or the like. In order forthe composite image/video and/or the composite field of view to appearto be a single coherent image/video and/or composite field of view froma single image capture device, some image processing of the image orframes of the video might be necessary to ensure that stitching of thedifferent images/frames of video is seamless. This is especiallyimportant for three-dimensional (“3-D”) images/video having beencollected or captured by different image capture devices (and thus havedifferent fields of view).

Merely by way of example, although the above embodiments have beendescribed with respect to single users for each ICD 805 (or each displaydevice 840), the various embodiments are not so limited, and multipleusers or viewers may be accommodated. In some embodiments, toaccommodate multiple users, techniques not unlike those used for 3-Dtelevisions or 3-D movies may be implemented. In one non-limitingexample, each user viewing a display device 840 might wear glasses, notunlike active 3-D glasses. For active glasses, the glasses might each bein wireless communication (e.g., infrared communication, Bluetoothcommunication, WiFi communication, and/or the like) with the ICD 805,and the timing of each device may be synchronized by the ICD 805. Afirst viewer might wear a first pair of active glasses, while a secondviewer might wear a second pair of active glasses, and a third viewermight wear a third pair of glasses. In one set of non-limiting examples,the ICD 805 might send a first frame of video to be displayed on thedisplay device 840, and while the first frame of video is displayed, thefirst pair of active glasses might be set to not block (i.e., to allow)light that is received from the frame, but each of the second and thirdpairs of active glasses might be set to block the light received fromthe frame. The ICD 805 might then send a second frame of video to bedisplayed on the display device 840, and while the second frame of videois displayed, the second pair of active glasses might be set to notblock (i.e., to allow) light that is received from the frame, but eachof the first and third pairs of active glasses might be set to block thelight received from the frame. In a similar manner, the ICD 805 mightsend a third frame of video to be displayed on the display device 840,and while the third frame of video is displayed, the third pair ofactive glasses might be set to not block (i.e., to allow) light that isreceived from the frame, but each of the first and second pairs ofactive glasses might be set to block the light received from the frame.The fourth frame of video might be treated in the same manner as thefirst frame, while the fifth frame might be treated in the same manneras the second frame, and the sixth frame might be treated in the samemanner as the third frame, and so on.

Each of the frames of video might be modified in a manner similar to theabove that takes into account the relative positions of each of thefirst through third viewers relative to display device 840. In this way,to the first viewer, the displayed and perceived images through thedisplay device 840 and through the first pair of glasses closely reflectan apparent field of view as if the first viewer was looking through areal window (or in this case, a virtual window) despite moving relativeto the window. The second and third viewers might perceive similareffects from their respectively positions relative to the display device840.

Although this set of examples describes the system applying to onlythree viewers, the various embodiments are not so limited, and anysuitable number of viewers may be used (say, n users). For n users, thefirst user might receive through the first active glasses first,(n+1)^(th), etc. frames of the video, while the n^(th) user mightreceive through the n^(th) active glass n^(th), 2n^(th), etc. frames ofthe video. The ICD 805 may also adjust the frame rate to ensure seamlessdisplay of the video. Currently, for example, 24 frames per second (or24 Hz) is a standard frame rate for film, 60 i (or interlaced, which iseffectively about 30 frames per second) is a current standard frame ratefor U.S. television broadcasts, 50p or 60p (or progressive, which iseffectively about 50 or 60 frames per second) is currently used inhigh-end HDTV systems, and so on. Higher frame rates (as well as otherframe rates) are also being tested. The ICD 805, in some cases, mightadjust the overall frame rate to be higher, in order to account for then viewers, such that each viewer receives an effective frame rate thatis one of the same as, half of, a third of, a quarter of, or a fifth ofone of these frame rates, or the like.

In some cases, for video calls, one side might have a single user, whilethe other side might have multiple users. The single-user side mightfunction in a manner similar to that as described above for singleusers, while the multiple-user side might function in a manner similarto that as described above for multiple users. In some cases, both sidesof the video calls might have multiple, but different numbers of users(for example, one side might have n users, while the other has m users,or the like). In most cases, the ICD 805 might determine whether amultiple-user situation exists, by determining presence of more than oneuser, and in some instances determining with eye tracking techniques howmany users are actually viewing the display device 840. The ICD 805 thenappropriately signals active glasses of the users to appropriatelydelivery the appropriate frames of the video to each user to allow forindividualized perceptions of the virtual window, as described in detailabove.

For multi-party video calls, similar techniques might apply. Forexample, in a 3-party video call, each display device might be splitinto two panels, each showing one of the other 2 parties. In such cases,depending on the positions on the display device in which each panel isarranged, the apparent view of each panel might be modified accordingly.For instance, if the panels are arranged side by side, the center ofeach panel would be off-center with respect to the display device, andthe ICD 805 and/or the control server 810 might modify the field of viewof the left panel as if the viewer was shifted to the right, and mightmodify the field of view of the right panel as if the viewer was shiftedto the left. For panels that are arranged one on top of the other, theICD 805 and/or the control server 810 might determine the relationalpositions of the viewer's eyes with respect to the centers of each ofthe panels, and might modify the apparent views displayed in the panelsaccordingly. Although the example above only discusses a 3-party call,any number of parties may be on the video call (and any number ofparticipants may be present at each party's location). Although theseexamples are directed to adjacent and aligned panels, the variousembodiments are not so limited, and the panels may be arranged in anyrelative position on the display screen with respect to each other. Insome cases, one panel might be made smaller than another panel, or thelike.

According to some embodiments, 3-D video content may similarly bedisplayed to a single viewer or to multiple viewers. For a single user,half of the frames might be directed to the left eye of the user, whilethe other half of the frames might be directed to the right eye of theuser, in alternating fashion. For multiple viewers, for each of theframes described above for the n viewers, two frames would be permittedto pass through each viewer's pair of active glasses (one to only theleft eye of the viewer and the other to only the right eye of theviewer). The left eye view and the right eye view would be appropriatelygenerated and/or modified such that the combined frames by the two eyeviews provide the desired depth information to form 3-D views.

FIG. 9 is a process flow diagram illustrating a method 900 of providinga virtual window or for modifying an apparent view(s) of displayedcontent, based at least in part on sensed presence and/or determinedposition(s) of a user in a room, in accordance with various embodiments.While the techniques and procedures of FIG. 9 are depicted and/ordescribed in a certain order for purposes of illustration, it should beappreciated that certain procedures may be reordered and/or omittedwithin the scope of various embodiments. Moreover, while the methodillustrated by FIG. 9 can be implemented by (and, in some cases, aredescribed below with respect to) the systems 100, 1000, and/or 1100 ofFIGS. 1, 10, and/or 11, respectively (or components thereof), thesemethods may also be implemented using any suitable hardwareimplementation. Similarly, while each of the system 100 (and/orcomponents thereof) of FIG. 1, the system 1000 (and/or componentsthereof) of FIG. 10, and/or the system 1100 (and/or components thereof)of FIG. 11 can operate according to the method illustrated by FIG. 9(e.g., by executing instructions embodied on a computer readablemedium), the systems 100, 1000, and/or 1100 can also operate accordingto other modes of operation and/or perform other suitable procedures.

According to some embodiments, the method 900 might comprise, at block905, capturing (e.g., with a video calling device or other user device)an image or video of a user(s), who might be a first party to a videocall or who might simply be the viewer of a video stream (or stillimage), such as a television program, video game, live stream of aremote scene, and/or the like. If the user is a party to a video call,this captured video can be transmitted to another video calling devicein a remote location used by another party to the video call (block910), as described in the '182 patent, for example.

The method 900 can further comprise identifying one or more featureswithin the captured image/video (block 915). Merely by way of example,the method 900 might include processing video with facial recognitionsoftware, silhouette detection software, eye-tracking software, and/orthe like. At block 920, the method 900 can include determining aposition of the user(s) with respect to a display device (or speakers,or any other defined point). In some cases, the spatial relationshipbetween the user device (or other camera) used to capture theimage/video and the display device might be known (such as, for example,if both the camera and the display are integrated into a single device,or if the user device is designed to be placed on top of the displaydevice). In other cases, the user might specify the relative positionsof these devices (e.g., in a guided setup operation and/or byconfiguring user preferences on the user device). In some cases, theuser device (or other camera) used to capture the image/video and/or thedisplay device might communicate with each other or with a servercomputer over a local or wider network to determine relative positions(either by exchange location information, if each device has suchcapability, and/or by using triangulation techniques or similartechniques, or the like). In other cases, the location of the userdevice can be used as a proxy for the location of the display deviceitself. Hence, the user's position with respect to the user device canbe used to derive or estimate the user's position with respect to thedisplay device.

According to some embodiments, a known object (e.g., an object packagedwith one or more of the image capture device, user device, displaydevice, video calling device, and/or the like) might be placed withinthe field of view of the image capture device; because the dimensions ofthe object are already known, determination of the relative size of theimage captured object can be used to determine distance relative to theobject, and the object can be used as a point of reference fordetermining distance and/or position of the user(s). In some instances,the known object might be a wearable object (such as a pin, brooch,button, etc. that might be affixed to clothing of the user). In someembodiments, the known object need not be on the user, much less veryclose to the user; image analysis (e.g., lighting analysis, shadowanalysis, and/or the like) might be used to determine relative positionsbetween the user and the known object. In some cases, any object may becalibrated to serve as such a known object and point of reference.According to some embodiments, sonar, lidar, or other similar techniquesmight be used to determine distances and/or relative positions of theuser(s), with respect to the image capture device and/or the displaydevice.

To determine the user's position with respect to the user device (e.g.,video calling device), a number of techniques can be used. For example,as noted above, the position of the user in three dimensions can be usedto adjust the apparent view of the displayed video. Two of thedimensions can be considered the horizontal and vertical dimensions in aplane parallel to the display device (and/or a plane normal to thevisual axis from the user's position to the focal point of the camera onthe user device. FIG. 7A, for example, shows a plane 725 that isparallel to the display device, and the axes x and z represent thehorizontal and vertical dimensions, respectively. The third dimension(i.e., dimension y, as shown, e.g., in FIGS. 4D, 4F, and 5A) is thedistance of the axis from the user to the focal point of the camera. Todetermine the user's position in the first two dimensions (e.g., x and zdimensions), the identified features in the captured video/image of theuser (as described above) can be used to identify a position in bothdimensions. To determine the user's position in the third dimension(e.g., y dimension), any of a number of distance estimation techniquescan be used, including, without limitation, laser rangefinding, parallaxfocusing, and/or the like.

The method 900, then, can comprise adjusting the apparent view of thedisplayed video (e.g., a video call, video game, media content, etc.),based on the determined position of the viewing user (block 925).Adjusting the apparent view of the video can comprise one or more ofseveral operations. Merely by way of example, in some cases, adjustingthe apparent view can comprise adjusting the apparent FOV, that is, thefield of view that the user perceives when viewing the video, tocorrespond to the user's position(s) relative the display device (block930). This adjustment can be performed by creating a windowed FOV (asnoted above with respect to FIG. 6), and/or it can include panning,tilting (or vertical panning), and/or zooming a real or virtual cameracapturing the video (for example, in a live stream or video callcontext), and/or it can include adjusting a raw video stream to providethe appropriate apparent field of view.

Additionally and/or alternatively, adjusting an apparent view cancomprise adjusting an apparent perspective of the displayed video, i.e.,the perspective that the user perceives when viewing the display, tocorrespond to the user's position relative to the display device (block935). This operation can also be accomplished in a number of ways. Forexample, in a three-dimensional (“3-D”) video feed, the 3-D aspects ofthe video stream can be manipulated to provide an appropriateperspective. In other cases, adjusting the perspective might includemoving a real or virtual camera (either by pan/tilt or throughtranslation of the camera) to capture a displayed scene that correspondsto the user's position relative to the display device. In other cases,if the capturing device comprises an array of two or more cameras, thedevice might create a composite FOV that is a mosaic of the fields ofview of a plurality of those cameras. The selection of cameras that areused to create the composite FOV can be changed to adjust theperspective given to the captured (and displayed) video and the apparentperspective offered to the user.

As noted above, in some cases, adjusting the view might compriseprocessing the captured video to effect the adjustment (either at thecapturing device, the displaying device, or a control server, or at acombination of two or more of those devices), and the method 900,accordingly, can comprise modifying a video signal (with any of suchdevices) to adjust the apparent view of the displayed video (block 940).Alternatively and/or additionally, as noted above, the position and/orbehavior of cameras at the capturing device can be adjusted to effectthose changes, and the method 900, therefore, can include sendinginstructions from a displaying device (or a control server) to thecapturing device to adjust the camera(s) accordingly (block 945),receiving such instructions at the capturing device (block 950), and/orcontrolling one or more cameras in accordance with the receivedinstructions (block 955).

In some cases, certain embodiments are configured to provide real-time(or near real-time) adjustments to the apparent view of the displayedvideo. In such embodiments, the user device on the viewer side can beconfigured to continually and/or periodically monitor the position ofthe user relative to the display device, and if the user devicedetermines that the user has moved (block 960), the system can modifythe apparent view of the displayed video (block 965), e.g., using thetechniques described above, as shown by the flow continuing back toblock 930.

The reader should note, as indicated above, that the functionalitydescribed with respect to certain system components in the method 900 ofFIG. 9 can be performed by any other system components, as appropriate.Merely by way of example, the video calling device (or other userdevice) at the viewer's location might not have sufficient processingpower to perform some or all of the functions described above, and insuch cases, the control server (or another component) may perform suchfunctions. For instance, the video calling device (or other user device)might capture video of the user and transmit that video to the controlserver (e.g., as part of a video call), and the control server mightanalyze that video for user position information before forwarding it tothe video calling device at the other end of the video call; the controlserver then might provide instructions to the video calling devicecapturing video to be displayed to the user to modify camera behaviorand/or might modify the video signal it receives from that video callingdevice before forwarding it to the viewer's calling device for display.Based on this example, the skilled reader should understand that thefunctionality described herein can be divided among system components inany appropriate manner.

It should also be appreciated that this functionality can be provided atboth ends of a video call, such that a video device capturing videofirst party to a call can use that video to determine the position ofthe first party (and adjust the first party's apparent viewaccordingly), while a video calling device catching video of a secondparty to the call can use that video to determine a position of thesecond party relative to a display device on the second parties and ofthe call (and adjust the second parties apparent view accordingly).Thus, the video captured of each party can be adjusted for display tothe other party as part of the video call, providing a much morelifelike and interesting video calling experience.

Further, as noted above, the reader should understand that thetechniques described herein can have utility in a wide variety ofapplications and are not limited to the examples described above. Merelyby way of example, these techniques can be used to provide a morerealistic experience in the display of video games (e.g., using camerasor camera arrays in common use with many modern video game consoles), toprovide a virtual window of a picturesque scene (e.g., times square, anature scene, a child's room, and/or the like) in a remote location,such as in a virtual picture frame in an office. Similar techniques canbe used to enhance the presentation of television programs, sports,and/or any other broadcast video, movies, and/or the like.

FIG. 10 provides a schematic illustration of one embodiment of acomputer system 1000 that can perform the methods provided by variousother embodiments, as described herein, and/or can function as a videocalling device, ICD, PDD, user device, control server, server computer,web server, and/or the like. It should be noted that FIG. 10 is meantonly to provide a generalized illustration of various components, ofwhich one or more (or none) of each may be utilized as appropriate. FIG.10, therefore, broadly illustrates how individual system elements may beimplemented in a relatively separated or relatively more integratedmanner.

The computer system 1000 is shown comprising hardware elements that canbe electrically coupled via a bus 1005 (or may otherwise be incommunication, as appropriate). The hardware elements may include one ormore processors 1010, including without limitation one or moregeneral-purpose processors and/or one or more special-purpose processors(such as digital signal processing chips, graphics accelerationprocessors, and/or the like); one or more input devices 1015, which caninclude, without limitation, a mouse, a keyboard, and/or the like; andone or more output devices 1020, which can include, without limitation,a display device, a printer, and/or the like.

The computer system 1000 may further include (and/or be in communicationwith) one or more storage devices 1025, which can comprise, withoutlimitation, local and/or network accessible storage, and/or can include,without limitation, a disk drive, a drive array, an optical storagedevice, solid-state storage device such as a random access memory(“RAM”) and/or a read-only memory (“ROM”), which can be programmable,flash-updateable, and/or the like. Such storage devices may beconfigured to implement any appropriate data stores, including, withoutlimitation, various file systems, database structures, and/or the like.

The computer system 1000 might also include a communications subsystem1030, which can include, without limitation, a modem, a network card(wireless or wired), an infra-red communication device, a wirelesscommunication device and/or chipset (such as a Bluetooth™ device, an802.11 device, a WiFi device, a WiMax device, a WWAN device, cellularcommunication facilities, etc.), and/or the like. The communicationssubsystem 1030 may permit data to be exchanged with a network (such asthe network described below, to name one example), with other computersystems, and/or with any other devices described herein. In manyembodiments, the computer system 1000 will further comprise a workingmemory 1035, which can include a RAM or ROM device, as described above.

The computer system 1000 also may comprise software elements, shown asbeing currently located within the working memory 1035, including anoperating system 1040, device drivers, executable libraries, and/orother code, such as one or more application programs 1045, which maycomprise computer programs provided by various embodiments, and/or maybe designed to implement methods, and/or configure systems, provided byother embodiments, as described herein. Merely by way of example, one ormore procedures described with respect to the method(s) discussed abovemight be implemented as code and/or instructions executable by acomputer (and/or a processor within a computer); in an aspect, then,such code and/or instructions can be used to configure and/or adapt ageneral purpose computer (or other device) to perform one or moreoperations in accordance with the described methods.

A set of these instructions and/or code might be encoded and/or storedon a non-transitory computer readable storage medium, such as thestorage device(s) 1025 described above. In some cases, the storagemedium might be incorporated within a computer system, such as thesystem 1000. In other embodiments, the storage medium might be separatefrom a computer system (i.e., a removable medium, such as a compactdisc, etc.), and/or provided in an installation package, such that thestorage medium can be used to program, configure, and/or adapt a generalpurpose computer with the instructions/code stored thereon. Theseinstructions might take the form of executable code, which is executableby the computer system 1000 and/or might take the form of source and/orinstallable code, which, upon compilation and/or installation on thecomputer system 1000 (e.g., using any of a variety of generallyavailable compilers, installation programs, compression/decompressionutilities, etc.) then takes the form of executable code.

It will be apparent to those skilled in the art that substantialvariations may be made in accordance with specific requirements. Forexample, customized hardware (such as programmable logic controllers,field-programmable gate arrays, application-specific integratedcircuits, and/or the like) might also be used, and/or particularelements might be implemented in hardware, software (including portablesoftware, such as applets, etc.), or both. Further, connection to othercomputing devices such as network input/output devices may be employed.

As mentioned above, in one aspect, some embodiments may employ acomputer system (such as the computer system 1000) to perform methods inaccordance with various embodiments of the invention. According to a setof embodiments, some or all of the procedures of such methods areperformed by the computer system 1000 in response to processor 1010executing one or more sequences of one or more instructions (which mightbe incorporated into the operating system 1040 and/or other code, suchas an application program 1045) contained in the working memory 1035.Such instructions may be read into the working memory 1035 from anothercomputer readable medium, such as one or more of the storage device(s)1025. Merely by way of example, execution of the sequences ofinstructions contained in the working memory 1035 might cause theprocessor(s) 1010 to perform one or more procedures of the methodsdescribed herein.

According to some embodiments, system 1000 might further comprise one ormore sensors 1050, which might include, without limitation, one or morecameras, one or more IR sensors, and/or one or more 3D sensors, or thelike. In some cases, the one or more sensors 1050 might be incorporatedin (or might otherwise be one of) the input device(s) 1015. The outputdevice(s) 1020 might, in some embodiments, further include one or moremonitors, one or more TVs, and/or one or more display screens, or thelike.

The terms “machine readable medium” and “computer readable medium,” asused herein, refer to any medium that participates in providing datathat causes a machine to operate in a specific fashion. In an embodimentimplemented using the computer system 1000, various computer readablemedia might be involved in providing instructions/code to processor(s)1010 for execution and/or might be used to store and/or carry suchinstructions/code (e.g., as signals). In many implementations, acomputer readable medium is a non-transitory, physical, and/or tangiblestorage medium. Such a medium may take many forms, including, but notlimited to, non-volatile media, volatile media, and transmission media.Non-volatile media includes, for example, optical and/or magnetic disks,such as the storage device(s) 1025. Volatile media includes, withoutlimitation, dynamic memory, such as the working memory 1035.Transmission media includes, without limitation, coaxial cables, copperwire and fiber optics, including the wires that comprise the bus 1005,as well as the various components of the communication subsystem 1030(and/or the media by which the communications subsystem 1030 providescommunication with other devices). Hence, transmission media can alsotake the form of waves (including, without limitation, radio, acoustic,and/or light waves, such as those generated during radio-wave andinfra-red data communications).

Common forms of physical and/or tangible computer readable mediainclude, for example, a floppy disk, a flexible disk, a hard disk,magnetic tape, or any other magnetic medium, a CD-ROM, any other opticalmedium, punch cards, paper tape, any other physical medium with patternsof holes, a RAM, a PROM, an EPROM, a FLASH-EPROM, any other memory chipor cartridge, a carrier wave as described hereinafter, or any othermedium from which a computer can read instructions and/or code.

Various forms of computer readable media may be involved in carrying oneor more sequences of one or more instructions to the processor(s) 1010for execution. Merely by way of example, the instructions may initiallybe carried on a magnetic disk and/or optical disc of a remote computer.A remote computer might load the instructions into its dynamic memoryand send the instructions as signals over a transmission medium to bereceived and/or executed by the computer system 1000. These signals,which might be in the form of electromagnetic signals, acoustic signals,optical signals, and/or the like, are all examples of carrier waves onwhich instructions can be encoded, in accordance with variousembodiments of the invention.

The communications subsystem 1030 (and/or components thereof) generallywill receive the signals, and the bus 1005 then might carry the signals(and/or the data, instructions, etc. carried by the signals) to theworking memory 1035, from which the processor(s) 1005 retrieves andexecutes the instructions. The instructions received by the workingmemory 1035 may optionally be stored on a storage device 1025 eitherbefore or after execution by the processor(s) 1010.

As noted above, a set of embodiments comprises systems collectingpresence information and/or enabling monitoring of media contentpresentation and determination (e.g., selection or generation) ofadvertisements, based on presence information (regardless of whether theuser device detecting the presence detection is owned by and/orassociated with the user). FIG. 11 illustrates a schematic diagram of asystem 1100 that can be used in accordance with one set of embodiments.The system 1100 can include one or more user computers 1105. Inparticular, a user computer 1105 can be a video calling device, an ICD,a PDD, and/or a user device, as described above. More generally, a usercomputer 1105 can be a general purpose personal computer (including,merely by way of example, desktop computers, workstations, tabletcomputers, laptop computers, handheld computers, mobile phones, smartphones, and the like), running any appropriate operating system, severalof which are available from vendors such as Apple, Microsoft Corp., aswell a variety of commercially-available UNIX™ or UNIX-like operatingsystems. A user computer 1105 can also have any of a variety ofapplications, including one or more applications configured to performmethods provided by various embodiments (as described above, forexample), as well as one or more office applications, database clientand/or server applications, and/or web browser applications.Alternatively, a user computer 1105 can be any other electronic device,such as a thin-client computer, Internet-enabled mobile telephone,and/or personal digital assistant, capable of communicating via anetwork (e.g., the network 1110 described below) and/or of displayingand navigating web pages or other types of electronic documents.Although the exemplary system 1100 is shown with two user computers1105, any number of user computers can be supported.

Certain embodiments operate in a networked environment, which caninclude a network 1110. The network 1110 can be any type of networkfamiliar to those skilled in the art that can support datacommunications using any of a variety of commercially-available (and/orfree or proprietary) protocols, including, without limitation, TCP/IP,SNA™, IPX™, AppleTalk™, and the like. Merely by way of example, thenetwork 1110 can include a local area network (“LAN”), including,without limitation, a fiber network, an Ethernet network, a Token-Ring™network and/or the like; a wide-area network; a wireless wide areanetwork (“WWAN”); a virtual network, such as a virtual private network(“VPN”); the Internet; an intranet; an extranet; a public switchedtelephone network (“PSTN”); an infra-red network; a wireless network,including without limitation a network operating under any of the IEEE802.11 suite of protocols, the Bluetooth™ protocol known in the art,and/or any other wireless protocol; and/or any combination of theseand/or other networks.

Embodiments can also include one or more server computers 1115. Each ofthe server computers 1115 may be configured with an operating system,including, without limitation, any of those discussed above with respectto the user computers 1105, as well as any commercially (or freely)available server operating systems. Each of the servers 1115 may also berunning one or more applications, which can be configured to provideservices to one or more clients 1105 and/or other servers 1115.

Merely by way of example, one of the servers 1115 might be a controlserver, with the functionality described above. In another embodiment,one of the servers might be a web server, which can be used, merely byway of example, to provide communication between a user computer 1105and a control server, for example, to process requests for web pages orother electronic documents from user computers 1105 and/or to provideuser input to the control server. The web server can also run a varietyof server applications, including HTTP servers, FTP servers, CGIservers, database servers, Java servers, and the like. In someembodiments of the invention, the web server may be configured to serveweb pages that can be operated within a web browser on one or more ofthe user computers 1105 to perform operations in accordance with methodsprovided by various embodiments.

The server computers 1115, in some embodiments, might include one ormore application servers, which can be configured with one or moreapplications accessible by a client running on one or more of the clientcomputers 1105 and/or other servers 1115. Merely by way of example, theserver(s) 1115 can be one or more general purpose computers capable ofexecuting programs or scripts in response to the user computers 1105and/or other servers 1115, including, without limitation, webapplications (which might, in some cases, be configured to performmethods provided by various embodiments). Merely by way of example, aweb application can be implemented as one or more scripts or programswritten in any suitable programming language, such as Java™, C, C#™ orC++, and/or any scripting language, such as Perl, Python, or TCL, aswell as combinations of any programming and/or scripting languages. Theapplication server(s) can also include database servers, including,without limitation, those commercially available from Oracle™,Microsoft™, Sybase™, IBM™, and the like, which can process requests fromclients (including, depending on the configuration, dedicated databaseclients, API clients, web browsers, etc.) running on a user computer1105 and/or another server 1115. In some embodiments, an applicationserver can create web pages dynamically for displaying the informationin accordance with various embodiments, such as providing a userinterface for a control server, as described above. Data provided by anapplication server may be formatted as one or more web pages (comprisingHTML, JavaScript, etc., for example) and/or may be forwarded to a usercomputer 1105 via a web server (as described above, for example).Similarly, a web server might receive web page requests and/or inputdata from a user computer 1105 and/or forward the web page requestsand/or input data to an application server. In some cases, a web servermay be integrated with an application server.

In accordance with further embodiments, one or more servers 1115 canfunction as a file server and/or can include one or more of the files(e.g., application code, data files, etc.) necessary to implementvarious disclosed methods, incorporated by an application running on auser computer 1105 and/or another server 1115. Alternatively, as thoseskilled in the art will appreciate, a file server can include allnecessary files, allowing such an application to be invoked remotely bya user computer 1105 and/or server 1115.

It should be noted that the functions described with respect to variousservers herein (e.g., application server, database server, web server,file server, etc.) can be performed by a single server and/or aplurality of specialized servers, depending on implementation-specificneeds and parameters. Further, as noted above, the functionality of oneor more servers 1115 might be implemented by one or more containers orvirtual machines operating in a cloud environment and/or a distributed,cloud-like environment based on shared resources of a plurality of uservideo calling devices, a plurality of ICDs, and/or a plurality of PDDs.

In certain embodiments, the system can include one or more data stores1120. The nature and location of the data stores 1120 is discretionary:merely by way of example, one data store 1120 might comprise a database1120 a that stores information about master accounts, user profiles,user preferences, assigned video calling devices,viewing/listening/Internet browsing/gaming patterns,viewing/listening/Internet browsing/gaming history, etc. Alternativelyand/or additionally, a data store 1120 b might be a cloud storageenvironment for storing master accounts, user profiles, userpreferences, uploaded monitored reactions of users, and/or the like.

As the skilled reader can appreciate, the database 1120 a and the cloudstorage environment 1120 b might be collocated and/or separate from oneanother. Some or all of the data stores 1120 might reside on a storagemedium local to (and/or resident in) a server 1115 a. Conversely, any ofthe data stores 1120 (and especially the cloud storage environment 1120b) might be remote from any or all of the computers 1105, 1115, so longas it can be in communication (e.g., via the network 1110) with one ormore of these. In a particular set of embodiments, a database 1120 a canreside in a storage-area network (“SAN”) familiar to those skilled inthe art, and/or the cloud storage environment 1120 b might comprise oneor more SANs. (Likewise, any necessary files for performing thefunctions attributed to the computers 1105, 1115 can be stored locallyon the respective computer and/or remotely, as appropriate.) In one setof embodiments, the database 1120 a can be a relational database, suchas an Oracle database, that is adapted to store, update, and retrievedata in response to SQL-formatted commands. The database might becontrolled and/or maintained by a database server, as described above,for example.

As noted above, the system can also include a first ICD 1125 and asecond ICD 1130. The first ICD 1125 in the context of the examplesdescribed herein corresponds to a device associated with a first user(or first video call participant), while the second ICD 1130 mightcorrespond to a device associated a second user (or second video callparticipant). Although only two ICDs are illustrated in FIG. 11, itshould be appreciated that any number of ICDs 1125-1130 may beimplemented in accordance with various embodiments.

Using the techniques described herein, each of the first ICD 1125 or thesecond ICD 1130 can determine presence and/or positions of one or moreusers (or audience members, or call participants, etc.), modify thedisplayed view based at least in part on the determined presence and/orpositioned of the one or more users, and/or the like.

Each of the first ICD 1125 or the second ICD 1130 may be (or may havesimilar functionality as) a video calling device 105, a user device 105,an ICD 105, or a PDD 105, as described in detail above; in some cases,each of the first ICD 1125 or the second ICD 1130 might be (or may havesimilar functionality as) a VCD as described in the '182 patent.

While certain features and aspects have been described with respect toexemplary embodiments, one skilled in the art will recognize thatnumerous modifications are possible. For example, the methods andprocesses described herein may be implemented using hardware components,software components, and/or any combination thereof. Further, whilevarious methods and processes described herein may be described withrespect to particular structural and/or functional components for easeof description, methods provided by various embodiments are not limitedto any particular structural and/or functional architecture but insteadcan be implemented on any suitable hardware, firmware, and/or softwareconfiguration. Similarly, while certain functionality is ascribed tocertain system components, unless the context dictates otherwise, thisfunctionality can be distributed among various other system componentsin accordance with the several embodiments.

Moreover, while the procedures of the methods and processes describedherein are described in a particular order for ease of description,unless the context dictates otherwise, various procedures may bereordered, added, and/or omitted in accordance with various embodiments.Moreover, the procedures described with respect to one method or processmay be incorporated within other described methods or processes;likewise, system components described according to a particularstructural architecture and/or with respect to one system may beorganized in alternative structural architectures and/or incorporatedwithin other described systems. Hence, while various embodiments aredescribed with—or without—certain features for ease of description andto illustrate exemplary aspects of those embodiments, the variouscomponents and/or features described herein with respect to a particularembodiment can be substituted, added, and/or subtracted from among otherdescribed embodiments, unless the context dictates otherwise.Consequently, although several exemplary embodiments are describedabove, it will be appreciated that the invention is intended to coverall modifications and equivalents within the scope of the followingclaims.

What is claimed is:
 1. A method, comprising: determining, with a userdevice comprising a camera, a position of a user relative to a displaydevice in communication with the user device; and adjusting an apparentview of video on the display device in response to the determinedposition of the user relative to the display device.
 2. The method ofclaim 1, wherein adjusting an apparent view of video on the displaydevice comprises adjusting an apparent field of view of the video tocorrespond to the determined position of the user relative to thedisplay device.
 3. The method of claim 1, wherein adjusting an apparentview of video on the display device comprises adjusting an apparentperspective of the video to correspond to the determined position of theuser relative to the display device.
 4. The method of claim 1, whereinthe user device comprises a video calling device, and wherein the videoon the display device comprises a video call.
 5. The method of claim 1,wherein the user device comprises a video game console, and wherein thevideo on the display device comprises a video game.
 6. The method ofclaim 1, wherein the video on the display device comprises one of avideo program, a television program, movie content, video media content,audio media content, game content, or image content.
 7. The method ofclaim 1, wherein the video on the display device comprises a live videostream captured by a camera in a location remote from the user device.8. The method of claim 1, further comprising: adjusting an audio trackof the video in response to the determined position of the user relativeto the display device.
 9. A user device, comprising: a sensor; aprocessor; and a computer readable medium having encoded thereon a setof instructions executable by the processor to cause the user device toperform one or more operations, the set of instructions comprising:instructions for determining a position of a user relative to a displaydevice in communication with the user device; and instructions foradjusting an apparent view of video on the display device in response tothe determined position of the user relative to the display device. 10.The user device of claim 9, wherein the user device comprises thedisplay device.
 11. A method, comprising: determining, with a videocalling device, a position of a first party to a video call relative toa display device that displays video of a video call; adjusting anapparent view of the video call, based at least in part on thedetermined position of the first party to the video call.
 12. The methodof claim 11, wherein the video calling device comprises: a video inputinterface to receive video input from a set-top box; an audio inputinterface to receive audio input from the set-top box; a video outputinterface to provide video output to the display device; an audio outputinterface to provide audio output to an audio receiver; a video capturedevice to capture video; an audio capture device to capture audio; anetwork interface; at least one processor; and a storage medium incommunication with the at least one processor, the storage medium havingencoded thereon a set of instructions executable by the at least oneprocessor to cause the video calling device to: control the videocapture device to capture a captured video stream; control the audiocapture device to capture a captured audio stream; encode the capturedvideo stream and the captured audio stream to produce a series of datapackets; and transmit the series of data packets on the networkinterface for reception by a second video calling device.
 13. The methodof claim 11, wherein adjusting an apparent view of the video callcomprises adjusting an apparent field of view of the video call.
 14. Themethod of claim 13, wherein determining a position of a first partycomprises determining a distance of the first party from the displaydevice.
 15. The method of claim 14, wherein adjusting an apparent fieldof view of the video comprises zooming the video based on the determineddistance of the first party from the display device.
 16. The method ofclaim 13, wherein determining a position of a first party comprisesdetermining a horizontal position of the first party in a horizontaldimension of a plane parallel to a face of the display device.
 17. Themethod of claim 16, wherein adjusting an apparent field of view of thevideo comprises panning the video in a horizontal direction, based onthe determined horizontal position of the first party.
 18. The method ofclaim 13, wherein determining a position of a first party comprisesdetermining a vertical position of the first party in a verticaldimension of a plane parallel to a face of the display device.
 19. Themethod of claim 18, wherein adjusting an apparent field of view of thevideo comprises panning the video in a vertical direction, based on thedetermined vertical position of the first party.
 20. The method of claim11, wherein adjusting an apparent view of the video call comprisesmodifying, at the video calling device, a video signal received by thevideo calling device.
 21. The method of claim 11, wherein the video isreceived from a second video calling device, and wherein adjusting anapparent view of the video call comprises instructing the second videocalling device to adjust a view of one or more cameras of the secondvideo calling device.
 22. The method of claim 21, wherein instructingthe second video calling device to adjust a view of one or more camerascomprises instructing the second video calling device to adjust a fieldof view of the one or more cameras.
 23. The method of claim 21, whereinthe second video calling device comprises an array of cameras, andwherein the field of view of the one or more cameras comprises a fieldof view of a composite image captured by a plurality of cameras withinthe array of cameras.
 24. The method of claim 23, wherein the apparentview of the video call comprises a virtual perspective of the compositeimage.
 25. The method of claim 24, wherein the virtual perspectiverepresents a perspective of the first party to the video call relativeto the display device.
 26. The method of claim 21, wherein instructingthe second video calling device to adjust a view of one or more camerascomprises instructing the second video calling device to adjust aperspective of the one or more cameras.
 27. The method of claim 21,wherein instructing the second video calling device to adjust a view ofone or more cameras comprises instructing the second video callingdevice to pan a camera in at least one of a horizontal dimension or avertical dimension.
 28. The method of claim 21, wherein instructing thesecond video calling device to adjust a view of a camera comprisesinstructing the second video calling device to zoom a camera.
 29. Themethod of claim 21, wherein instructing the second video calling deviceto adjust a view of a camera comprises instructing the second videocalling device to crop frames of a video stream captured by the camera.30. The method of claim 11, further comprising: determining, with thevideo calling device, that the first party has moved relative to thedisplay device; and modifying the apparent view of the video call, inresponse to determined movement of the first party.
 31. The method ofclaim 30, wherein modifying the apparent view of the video callcomprises modifying an apparent perspective of the video call, inresponse to determined movement of the first party.
 32. The method ofclaim 30, wherein modifying the apparent view of the video callcomprises modifying the apparent view of the video call substantially inreal time with the determined movement of the first party.
 33. Themethod of claim 11, wherein the video calling device comprises a camera,and determining a position of a first party to a video call comprisescapturing one or more images of the first party with the camera.
 34. Themethod of claim 33, wherein the one or more images comprise a videostream.
 35. The method of claim 34, further comprising transmitting thevideo stream to a second video calling device as part of the video call.36. The method of claim 33, wherein determining a position of a firstparty to a video call further comprises analyzing the one or more imagesto identify the position of the first party.
 37. The method of claim 36,wherein analyzing the one or more images comprises identifying, in theone or more images, positions of one or more eyes of the first party tothe video call.
 38. An apparatus, comprising: a computer readable mediumhaving encoded thereon a set of instructions executable by one or morecomputers to cause the one or more computers to: determine a position ofa first party to a video call relative to a display device that displaysvideo of a second party to the video call; and adjust an apparent viewof the video of the second party to the video call, based at least inpart on the determined position of the first party to the video call.39. A system, comprising: a video calling device, comprising: at leastone first processor; and a first computer readable medium incommunication with the at least one first processor, the first computerreadable medium having encoded thereon a first set of instructionsexecutable by the at least one first processor to cause the videocalling device to: determine a position of a first party to a video callrelative to a display device that displays video of a second party tothe video call; a computer, comprising: one or more second processors;and a second computer readable medium in communication with the one ormore second processors, the second computer readable medium havingencoded thereon a second set of instructions executable by the one ormore second processors to cause the computer to: adjust an apparent viewof the video of the second party to the video call, based at least inpart on the determined position of the first party to the video call.40. The system of claim 39, wherein the video calling device comprisesthe computer.
 41. The system of claim 39, wherein the video callingdevice comprises a first video calling device, the system furthercomprising a second video calling device that comprises a camera thatrecords the video of the second party to the video call.
 42. The systemof claim 39, wherein adjusting an apparent field of view of the video ofthe second party to the video call comprises transmitting, to the secondvideo calling device, instructions for adjusting a field of view of thecamera of the second video calling device.
 43. The system of claim 39,wherein the computer is a control server separate from the video callingdevice.
 44. The system of claim 39, wherein the computer is incorporatedwithin a second video calling device that further comprises a camerathat captures the video of the second party to the video call.
 45. Thesystem of claim 39, wherein the video calling device comprises: a videoinput interface to receive video input from a set-top box; an audioinput interface to receive audio input from the set-top box; a videooutput interface to provide video output to a display device; an audiooutput interface to provide audio output to an audio receiver; a videocapture device to capture video; and an audio capture device to captureaudio; wherein the first set of instructions further comprisesinstructions executable by the first processor to cause the videocalling device to: control the video capture device to capture acaptured video stream; control the audio capture device to capture acaptured audio stream; encode the captured video stream and the capturedaudio stream to produce a series of data packets; and transmit theseries of data packets on the network interface for reception by asecond video calling device.